Conversation history is truncated or summarized for long sessions
Why it matters
Passing an unbounded messages array to the AI provider is a ticking time bomb. Every conversation turn adds tokens to the payload; once the cumulative history exceeds the model's context window, the API returns a hard 400 error mid-session. OWASP LLM10 and CWE-770 classify this as an uncontrolled resource consumption failure. The user's active session terminates without warning. On multi-user platforms this failure recurs predictably, silently degrading retention for your most engaged users — the ones with the longest conversation histories.
Severity rationale
Critical because unbounded history growth guarantees eventual API failure for active users, crashing live sessions with no recovery path.
Remediation
Implement a token-budget sliding window that fills from the newest message backward, discarding old messages when the budget is exceeded. Do not use a fixed message count — use token count, because message length varies.
// src/lib/ai/context-manager.ts
import { countMessageTokens } from "./token-counter";
const CONTEXT_BUDGET = 6000;
export function buildContextWindow(
messages: Array<{ role: string; content: string }>,
model: string
): Array<{ role: string; content: string }> {
const result: typeof messages = [];
for (let i = messages.length - 1; i >= 0; i--) {
const candidate = [messages[i]!, ...result];
if (countMessageTokens(candidate, model) > CONTEXT_BUDGET) break;
result.unshift(messages[i]!);
}
return result;
}
In your route handler, call buildContextWindow(allMessages, model) before passing messages to streamText. Verify by logging payload token counts during a long conversation — the count should plateau, not grow unboundedly.
Detection
-
ID:
conversation-history-truncation -
Severity:
critical -
What to look for: In the chat API route (wherever
openai.chat.completions.create,streamText, orgenerateTextis called with amessagesarray), look for logic that limits what is sent. Common patterns includemessages.slice(-N), a loop that removes the oldest messages when a token budget is exceeded, or a helper function named something liketrimHistory,truncateMessages, orbuildContextWindow. Also check whether the full messages array from a database or client state is passed directly with no size guard. Count all instances found and enumerate each. -
Pass criteria: There is explicit logic to truncate, window, or filter older messages from the conversation history before the AI call. The
messagesarray sent to the provider is bounded. At least 1 implementation must be confirmed. -
Fail criteria: The entire
messagesarray from the client or database is passed directly to the AI provider without any length check or truncation. No guard exists between accumulated history and the outbound API call. -
Skip (N/A) when: The application is demonstrably single-turn (stateless) and does not persist or pass prior conversation turns. Signal: API calls use a single
promptstring rather than amessagesarray, or there is no mechanism in the UI to continue a conversation. -
Detail on fail:
"Full conversation history passed to API with no truncation — sessions will break as history grows" -
Remediation: Passing unbounded conversation history guarantees eventual API failure once the context limit is reached. The failure happens mid-session with no graceful fallback.
Implement a sliding window approach, bounded by token count rather than message count:
// src/lib/ai/context-manager.ts import { countMessageTokens } from "./token-counter"; const CONTEXT_BUDGET = 6000; // reserve headroom for system prompt + response export function buildContextWindow( messages: Array<{ role: string; content: string }>, model: string ): Array<{ role: string; content: string }> { const result: typeof messages = []; // iterate from newest to oldest, filling budget for (let i = messages.length - 1; i >= 0; i--) { const candidate = [messages[i]!, ...result]; if (countMessageTokens(candidate, model) > CONTEXT_BUDGET) break; result.unshift(messages[i]!); } return result; }Then in the route handler:
const boundedMessages = buildContextWindow(allMessages, model); const result = await streamText({ model: openai(model), messages: boundedMessages });Verify by sending more than 20 back-and-forth turns; the API payload (visible in server logs) should never grow beyond the budget.
External references
- owasp-llm:2025 · LLM10 — Unbounded Consumption
- cwe · CWE-770 — Allocation of Resources Without Limits or Throttling
- nist-ai-rmf:1.0 · MANAGE 2.2 — Mechanisms to sustain AI risk management are in place
- iso-25010:2011 · performance-efficiency.capacity — Capacity — bounded conversation history within context window
Taxons
History
- 2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated