Skip to content

fix: enforce monthly message-credit limit before chat LLM calls#157

Open
Shawnaldinho wants to merge 1 commit into
willchen96:mainfrom
Shawnaldinho:fix/credit-budget-pre-enforce
Open

fix: enforce monthly message-credit limit before chat LLM calls#157
Shawnaldinho wants to merge 1 commit into
willchen96:mainfrom
Shawnaldinho:fix/credit-budget-pre-enforce

Conversation

@Shawnaldinho
Copy link
Copy Markdown

Summary

`user_profiles.message_credits_used` is exposed on `/user/profile` as `creditsRemaining`, but on `main` today (a) no code increments it after an LLM call, so it's always 0, and (b) no code checks it before an LLM call. The "credits remaining" the UI shows is therefore a no-op gauge.

Changes

  • `backend/src/lib/credits.ts` (new) —
    • `monthlyCreditLimit()` reads `MONTHLY_MESSAGE_CREDIT_LIMIT` from the env, default `999999` (the constant previously inline in `routes/user.ts`). Behaviour-neutral unless an operator opts in.
    • `getCreditState(userId, db)` returns `{ used, limit, remaining }` for the pre-call check. Read-only, doesn't fetch the full profile.
    • `incrementMessageCredits(userId, db, n=1)` bumps the counter once per successful user-initiated message — not per tool turn — so the gauge reflects user-visible message volume.
  • `POST /chat` and `POST /projects/:projectId/chat`
    • Reject with `402` + `{ creditsUsed, creditsLimit }` when `remaining <= 0`, before flushing response headers, so the client sees a clean JSON error instead of a half-streamed response.
    • Increment on the success path after the assistant-message insert. Failed streams don't count against the user.
  • `routes/user.ts` now imports `monthlyCreditLimit()` instead of holding its own copy of the constant, so the env-driven limit is the single source of truth.

Why

This closes the specific "credit counter tracked but not enforced" gap from https://insights.flank.ai/where-mikeoss-falls-short.html (gap 6). Two design choices worth flagging:

  • One increment per user message, not per tool turn. A single chat turn can fire many tool calls; counting each would make the user-visible balance unintelligible. The user pressed Enter once, the gauge ticks once.
  • Default limit unchanged. `999999` was already the historical cap; operators who want enforcement set `MONTHLY_MESSAGE_CREDIT_LIMIT` to a smaller integer. Tier-aware limits are intentionally not in scope — once a single env-driven cap exists, tier overrides are a small follow-up.
  • Read-then-write, not atomic. Two near-simultaneous requests can under-count by one; acceptable for a soft monthly budget. If hard accounting is needed, swap for an `rpc('inc_credits', ...)` stored procedure — the call sites won't change.

Tabular and workflow LLM call sites are not touched here — the two streaming chat routes are the most user-visible entry points and wiring the rest is a wider change worth its own PR.

Testing

  • `npm run build --prefix backend` passes.
  • Walked the four paths manually: (a) limit unset → default 999999, no behaviour change, increment still bumps the counter; (b) limit set to a small number, user at limit → 402 with detail before any LLM call; (c) successful stream → counter +1; (d) LLM error inside the try block → counter untouched.

user_profiles.message_credits_used is surfaced on /user/profile as
`creditsRemaining`, but on main today (a) no code increments it after
an LLM call, so it's always 0, and (b) no code checks it before an
LLM call, so the value is informational only. The "credits remaining"
shown in the UI is therefore a no-op gauge.

Wire the field up:

- New backend/src/lib/credits.ts with:
  * monthlyCreditLimit() — reads MONTHLY_MESSAGE_CREDIT_LIMIT from the
    env, defaulting to 999999 (the constant previously hard-coded in
    routes/user.ts). Behaviour-neutral unless an operator opts in.
  * getCreditState(userId, db) — { used, limit, remaining } for the
    pre-call check; read-only, doesn't fetch the full profile.
  * incrementMessageCredits(userId, db, n=1) — bumps the counter; one
    call per user-initiated message, not per tool turn, so the gauge
    reflects user-visible message volume.

- POST /chat and POST /projects/:projectId/chat now:
  * Reject with 402 + { creditsUsed, creditsLimit } if remaining <= 0,
    before flushing response headers (so the client sees a clean error
    instead of a half-streamed response).
  * Increment after a successful runLLMStream + assistant-message
    insert. Failures don't count against the user.

- routes/user.ts now imports monthlyCreditLimit() instead of holding
  its own copy of the constant, so the env-driven limit is the single
  source of truth.

Tabular and workflow LLM call sites are left for a follow-up — the
two streaming chat routes are the most user-visible entry points and
adding the rest is a wider, more invasive change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant