Token counting is performed before API calls to prevent hard limit errors
Why it matters
Sending an oversized payload to the AI API is never free — even when the provider rejects it. The request consumes a full network round-trip, counts toward rate limit quotas on some configurations, and surfaces to the user as a generic 500 error rather than an actionable message. CWE-770 and CWE-400 both apply: unguarded resource allocation with no application-side limit check. NIST AI RMF MANAGE 1.3 requires predictable failure modes; a silent hard crash on context overflow is the opposite. A pre-flight guard converts this crash into a recoverable, user-friendly state.
Severity rationale
High because without a pre-flight guard, context overflow errors surface as generic failures — burning rate limit quota and giving users no actionable recovery information.
Remediation
Add a synchronous token check before every AI call that returns a structured 422 response when the payload would exceed the model's limit. Reserve headroom for the expected response length.
// src/app/api/chat/route.ts
const RESPONSE_RESERVE = 1000;
const totalTokens = countMessageTokens([
{ role: "system", content: systemPrompt },
...boundedMessages,
], model);
if (totalTokens > MODEL_CONFIG[model].contextWindow - RESPONSE_RESERVE) {
return Response.json(
{ error: "Conversation is too long. Please start a new chat to continue." },
{ status: 422 }
);
}
Verify by constructing a payload that exceeds the limit and confirming the UI displays the specific error rather than a generic failure screen.
Detection
-
ID:
token-counting-before-call -
Severity:
high -
What to look for: Look specifically for a pre-flight guard before the AI API call. This is distinct from post-call usage logging. Patterns to find: a conditional that throws or returns an error before calling the provider if token count exceeds a threshold; or auto-truncation logic that runs synchronously before the call. Also check error handling — is a
context_length_exceededor400error caught and surfaced as a user-friendly message, or does it propagate as a generic 500? Count all instances found and enumerate each. -
Pass criteria: The code prevents the API call if estimated tokens exceed the model's limit and returns a user-friendly error (e.g., "Your conversation is too long — please start a new chat"), OR the auto-truncation from the history check (previous check) makes this guard redundant by construction. At least 1 implementation must be confirmed.
-
Fail criteria: The code makes the API call unconditionally and allows the provider to throw a
400 context_length_exceedederror, which either crashes the route handler or surfaces as a generic error in the UI. -
Skip (N/A) when: No AI API integration is detected. Signal: No AI SDK dependencies in
package.json. -
Detail on fail:
"No pre-flight token guard — context_length_exceeded errors will surface as generic 500s" -
Remediation: API calls that fail due to length still consume network round-trip time and count toward rate limit quotas in some configurations. More importantly, the user gets a useless error.
Add a guard that fires before the API call:
// src/app/api/chat/route.ts const RESPONSE_RESERVE = 1000; // tokens reserved for the response const totalTokens = countMessageTokens([ { role: "system", content: systemPrompt }, ...boundedMessages, ], model); if (totalTokens > MODEL_CONTEXT_LIMITS[model] - RESPONSE_RESERVE) { return Response.json( { error: "Conversation is too long. Please start a new chat to continue." }, { status: 422 } ); } // safe to proceed const result = await streamText({ model: openai(model), messages: boundedMessages });Verify by constructing a payload that exceeds the limit and confirming the UI displays the specific error message rather than a generic failure.
External references
- owasp-llm:2025 · LLM10 — Unbounded Consumption
- cwe · CWE-770 — Allocation of Resources Without Limits or Throttling
- cwe · CWE-400 — Uncontrolled Resource Consumption
- nist-ai-rmf:1.0 · MANAGE 1.3 — Responses to identified AI risks are developed
- iso-25010:2011 · reliability.fault-tolerance — Fault Tolerance — pre-flight guard prevents context-overflow crashes
Taxons
History
- 2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated