How I Built the Kualia AI Assistant

The Kualia assistant can do three things that look simple from the outside.

It can answer questions about how your budget works. It can read your actual finances and explain what is going on this month. And if you drop a CSV from your bank into the chat, it will import the transactions for you, asking only which account they belong to.

Internally those three things share a lot more than they look like they share. They all run through the same tool registry. The in-app chat and the public MCP server dispatch through the same code path. The same scope checks gate both. This post is the tour I wish I had when I was figuring out how to wire it all up.

What I wanted

Three constraints drove almost every decision.

One source of truth for tools. I did not want a public MCP server that drifted from what the in-app assistant could do. Whatever Claude can do over MCP, the assistant in the app can do too. New tool, both surfaces, same arguments.

No second auth system for the in-app chat. Users are already signed in with Clerk. Making them OAuth into their own app to use the chat would be absurd. But the public MCP endpoint does need a real authorization story for third-party clients.

Cheap enough to leave on for everyone. I run a per-user monthly token cap, a global circuit breaker, and a router that sends boring messages to a cheaper model. Without that, one motivated user can spend more on chat than they pay in a year.

The shape of the system

Three pieces.

The MCP server is a JSON-RPC endpoint at /mcp on the Convex backend, authenticated with OAuth 2.1 + PKCE. The in-app chat is a Convex action that streams responses via the Convex Agent component and calls the same internal Convex functions the MCP server calls. The docs MCP is a third-party server at docs.kualia.com/mcp that the chat connects to as a client, so the assistant can answer “how do I do X” without me hand-rolling a docs RAG.

                 ┌──────────────────┐
Claude.ai ─────▶ │  /mcp (OAuth)    │ ──┐
                 └──────────────────┘   │
                                        ▼
                 ┌──────────────────┐  ┌──────────────────────┐
Kualia web/iOS ─▶│ aiChat.sendMessage│─▶│  Tool registry       │
                 │  (Convex action) │  │  (single source)     │
                 └──────┬───────────┘  └──────────┬───────────┘
                        │                         │
                        ▼                         ▼
                 docs.kualia.com/mcp        Convex DB
                 (docs tools)               (workspace data)

The trick that makes the whole thing manageable is that the tool registry is the contract. Everything else is plumbing around it.

The tool registry

Every tool the assistant can call lives in one file as a plain TypeScript array. Each entry has four things that matter: a name (kualia_get_envelope_month_status, kualia_set_category_assigned_amount, etc.), a description the model reads to decide when to use it, a JSON schema for inputs, and a single executor field that says 'query', 'mutation', or 'action'.

That last field is the whole game. The same registry feeds both surfaces, and both surfaces need to know how to call each tool: queries are cheap reads, mutations are writes, actions can call other actions (which matters for transaction import, because it falls back to an LLM categorization step for rows that no rule matches).

There are about a dozen tools in there today. Half are read-only reports (spending, income, monthly summary, account balances). The rest are surgical writes: assign a category, create transactions, delete transactions, restore transactions.

Two surfaces, one dispatcher

The public MCP endpoint is a JSON-RPC switch statement. When a tools/call comes in, the handler authenticates the bearer token, looks up the tool’s executor in the registry, and routes to one of three internal dispatchers (executeMcpTool for queries, executeMcpMutationTool for mutations, executeMcpActionTool for actions). Each dispatcher takes the same arguments: toolName, clerkId, activeWorkspaceId, scopes, and a rawArguments JSON string.

The in-app chat does the exact same dispatch, but skips HTTP entirely. It loops over the registry and hands the AI SDK a tool definition whose execute function calls one of those same three internal dispatchers directly.

for (const def of listMcpTools()) {
  tools[def.name] = tool({
    description: def.description,
    inputSchema: jsonSchema(def.inputSchema),
    execute: async (input) => runDispatch(def.executor, def.name, input),
  });
}

That symmetry is what lets the same tool work from Claude on the web and from the chat panel on iOS without a second implementation. When I add a new tool, I add it to the registry, write one executor function, and both clients pick it up.

Auth: scope strings, two front doors

The public MCP needs OAuth. Third-party clients (Claude, ChatGPT, scripts) cannot present a Clerk session, so I implemented an OAuth 2.1 server with PKCE on Convex. Discovery, dynamic client registration, authorization, token exchange, refresh. The dance is long but standard.

The interesting part is what happens after a token is minted. Every token carries a set of scopes (app:read, budgets:write, transactions:write), and those scopes flow into the executor as a plain string array. The in-app chat is authenticated with a Clerk JWT, but the dispatcher does not care. It only cares about scopes. So the chat hands it the same scope strings the OAuth flow would have granted, and the read-only checks and write checks run identically. One code path, two front doors.

This is also where multi-tenancy lands. Every executor calls a resolveWorkspace helper that verifies the user has access to the requested workspace (or falls back to their default) and returns the workspace doc. From that point on, every database read is filtered by the returned workspace ID. There is no path through the tool layer that does not go through that resolution. If I forget it, the tool returns nothing useful, which is the right failure mode.

The chat agent

The chat itself is a Convex action backed by the Convex Agent component, which handles thread persistence, message streaming, and delta sync to the client. I run two agent instances pointed at OpenRouter, one cheaper than the other, so I can swap models without redeploying.

Picking which one to use is a tiny router. First a cheap synchronous heuristic: greetings go simple, anything that mentions money words (“spent”, “category”, “this month”, a $ sign) goes complex. If the heuristic is unsure, I ask the cheap model to classify in one word. In practice 95% of messages never need the LLM classifier; the heuristic eats the obvious cases and the classifier handles the long tail.

The send loop itself is small. Pull a fresh timezone from the workspace so “today” is correct in the prompt, build the tool set, pick the agent, stream. The two details worth calling out:

stepCountIs caps tool-call rounds at 3 for simple and 5 for complex. Each round’s tool result accumulates in the context. Without a cap, one chatty thread that keeps asking for “more detail” can blow past the model’s context window and waste tokens on retries. The number is small on purpose.

The abort signal is wired to a poller that watches the agent component’s stream table. When the user hits stop, the controller fires and streamText unwinds mid-token, even between tool calls. Without that, hitting stop during a slow tool call would leave the stream hanging until the call resolved.

The docs MCP, as a client

The chat also connects out to a second MCP server, the one running at docs.kualia.com/mcp. That server hosts the docs site’s search and page-fetch tools. The chat treats it like any other tool source: it opens the MCP client once at boot, caches the tool schema at module scope, merges those tools into the tool set, and forwards calls.

Each Convex container pays the connection setup cost once and then reuses it. If the docs MCP is down, the chat keeps working, it just cannot answer “how do I” questions from the docs. Graceful degradation is cheap when you write it on day one.

File import, the part that surprised me

This is the feature that turned out to be the most fun to build, and the one that pushed me hardest to keep the tool layer clean.

A user drops a CSV from their bank. Or a PDF statement. Or a photo of a receipt. The chat takes the file and figures out what to do.

The frontend uploads the file to Convex storage first and gets back a storageId, then calls sendMessage with the prompt and an attachments array. From there the action does two different things depending on the file type.

Images and PDFs go to the model as URL references, not inlined bytes. The AI SDK fetches the URL at request time and shows the model the binary. I tried base64-inlining at first, but the agent component persists the user message verbatim and base64 of a few-megabyte image overflows Convex’s 1 MiB per-document limit. A URL is about a hundred bytes and lives for 24 hours, after which a scheduled mutation cleans it up.

CSV and plain text get UTF-8-decoded and wrapped in a tag inside the prompt:

<kualia-attachment-body filename="chase-march.csv" mediaType="text/csv">
Date,Description,Amount
2026-03-01,STARBUCKS #4412,-5.85
2026-03-02,WHOLE FOODS,-87.42
...
</kualia-attachment-body>

The chat panel strips that tag before rendering, so the user only sees their typed prompt and a chip for the file. The model sees the full text. There is a 256 KB cap per text file so a 10 MB CSV cannot blow up the context.

Then the system prompt takes over. The relevant section says, in plain English: when a file looks like transaction data, extract the rows yourself instead of asking the user to retype them. Skip headers and non-transaction lines (running balances, totals). Call kualia_list_bank_accounts and ask which account these belong to. Then call kualia_create_transactions with the rows and the chosen bankAccountId.

The model reads the CSV, extracts the rows, asks which account, calls the tool. The tool runs the import, dedups against existing transactions on that account, and returns { created, skipped, failed }. The model summarizes the result in plain English:

You: [drops chase-march.csv]
Kualia: I found 47 transactions in that file. Which account
        should I import them into?
You: Chase Sapphire
Kualia: Imported 41 transactions to Chase Sapphire. 6 were
        skipped as likely duplicates. Want me to import the
        skipped ones anyway?
You: yes
Kualia: Imported the remaining 6.

If the user follows up with “actually undo that”, the model has the new transaction IDs in its context and calls kualia_delete_transactions. If they say “restore those”, kualia_restore_transactions. The undo flow falls out of the tool design, not from a separate undo system.

Two things matter here that took me a while to get right.

Bank account ownership is verified inside the import action, not in the model. The system prompt asks the user to pick an account, but the executor checks that the chosen account belongs to the same workspace as the authenticated user. The model is a UX layer. It is not the security boundary.

Money is a decimal string, not a float. Every monetary value across the system is a signed decimal string (“-51.08”) whose precision is set by the workspace currency. Tool responses carry the minorUnit and currencyCode on every payload, and the system prompt makes the model read the value as written, no scaling. I learned the hard way that as soon as you let a model see 5108 for $51.08, it will eventually divide by 100 in the wrong place.

What I would do differently

A few things I would change if I were starting over.

I would build the OAuth server later. For the first six months, the only consumer of the public MCP was me on my laptop, and a personal access token would have been fine. The full OAuth flow is the right answer eventually, but I wrote it before I needed it.

I would write the tool registry first, even before the chat. Once the registry existed, both surfaces basically wrote themselves. Before it existed, I had a tangle of one-off chat-only handlers that I threw away when I realized MCP was the right protocol.

I would put more in the system prompt and less in code. The “skip header rows” rule for CSV import? That is a prompt instruction, not a parser. Every time I tried to be clever in code about “is this a transaction row”, the model was already cleverer. The right division of labor is the model parses, the code enforces.

What is next

I have not exposed everything yet. The chat can read account balances, list transactions, fund categories, and import files, but it cannot yet reconcile, edit existing transactions, or link a new bank. Those are tools that need careful undo stories before I add them. Soon.

The MCP server is open to anyone with a Kualia account. If you want to point Claude at your budget, you can: configure kualia.com/mcp as an MCP server in your client, authorize the OAuth flow, and you are in. Everything the in-app assistant can do is available there too, by design.

If you want to try the assistant inside the app, Kualia is free to sign up. Drop a CSV into the chat. Ask it what you spent on coffee last month. See what it gets right and what it gets wrong. I would love the feedback.