Conversation history is truncated or summarized for long sessions

ab-000310 · ai-token-optimization.context-management.conversation-history-truncation

Severity: criticalactive

Why it matters

Passing an unbounded messages array to the AI provider is a ticking time bomb. Every conversation turn adds tokens to the payload; once the cumulative history exceeds the model's context window, the API returns a hard 400 error mid-session. OWASP LLM10 and CWE-770 classify this as an uncontrolled resource consumption failure. The user's active session terminates without warning. On multi-user platforms this failure recurs predictably, silently degrading retention for your most engaged users — the ones with the longest conversation histories.

Severity rationale

Critical because unbounded history growth guarantees eventual API failure for active users, crashing live sessions with no recovery path.

Remediation

Implement a token-budget sliding window that fills from the newest message backward, discarding old messages when the budget is exceeded. Do not use a fixed message count — use token count, because message length varies.

// src/lib/ai/context-manager.ts
import { countMessageTokens } from "./token-counter";

const CONTEXT_BUDGET = 6000;

export function buildContextWindow(
  messages: Array<{ role: string; content: string }>,
  model: string
): Array<{ role: string; content: string }> {
  const result: typeof messages = [];
  for (let i = messages.length - 1; i >= 0; i--) {
    const candidate = [messages[i]!, ...result];
    if (countMessageTokens(candidate, model) > CONTEXT_BUDGET) break;
    result.unshift(messages[i]!);
  }
  return result;
}

In your route handler, call buildContextWindow(allMessages, model) before passing messages to streamText. Verify by logging payload token counts during a long conversation — the count should plateau, not grow unboundedly.

Detection

ID: conversation-history-truncation
Severity: critical
What to look for: In the chat API route (wherever openai.chat.completions.create, streamText, or generateText is called with a messages array), look for logic that limits what is sent. Common patterns include messages.slice(-N), a loop that removes the oldest messages when a token budget is exceeded, or a helper function named something like trimHistory, truncateMessages, or buildContextWindow. Also check whether the full messages array from a database or client state is passed directly with no size guard. Count all instances found and enumerate each.
Pass criteria: There is explicit logic to truncate, window, or filter older messages from the conversation history before the AI call. The messages array sent to the provider is bounded. At least 1 implementation must be confirmed.
Fail criteria: The entire messages array from the client or database is passed directly to the AI provider without any length check or truncation. No guard exists between accumulated history and the outbound API call.
Skip (N/A) when: The application is demonstrably single-turn (stateless) and does not persist or pass prior conversation turns. Signal: API calls use a single prompt string rather than a messages array, or there is no mechanism in the UI to continue a conversation.
Detail on fail: "Full conversation history passed to API with no truncation — sessions will break as history grows"

Remediation: Passing unbounded conversation history guarantees eventual API failure once the context limit is reached. The failure happens mid-session with no graceful fallback.

Implement a sliding window approach, bounded by token count rather than message count:

// src/lib/ai/context-manager.ts
import { countMessageTokens } from "./token-counter";

const CONTEXT_BUDGET = 6000; // reserve headroom for system prompt + response

export function buildContextWindow(
  messages: Array<{ role: string; content: string }>,
  model: string
): Array<{ role: string; content: string }> {
  const result: typeof messages = [];
  // iterate from newest to oldest, filling budget
  for (let i = messages.length - 1; i >= 0; i--) {
    const candidate = [messages[i]!, ...result];
    if (countMessageTokens(candidate, model) > CONTEXT_BUDGET) break;
    result.unshift(messages[i]!);
  }
  return result;
}

Then in the route handler:

const boundedMessages = buildContextWindow(allMessages, model);
const result = await streamText({ model: openai(model), messages: boundedMessages });

Verify by sending more than 20 back-and-forth turns; the API payload (visible in server logs) should never grow beyond the budget.

External references

owasp-llm:2025 · LLM10 — Unbounded Consumption
cwe · CWE-770 — Allocation of Resources Without Limits or Throttling
nist-ai-rmf:1.0 · MANAGE 2.2 — Mechanisms to sustain AI risk management are in place
iso-25010:2011 · performance-efficiency.capacity — Capacity — bounded conversation history within context window

Taxons

cost-efficiency inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated

Why it matters

Remediation

// src/lib/ai/context-manager.ts
import { countMessageTokens } from "./token-counter";

const CONTEXT_BUDGET = 6000;

export function buildContextWindow(
  messages: Array<{ role: string; content: string }>,
  model: string
): Array<{ role: string; content: string }> {
  const result: typeof messages = [];
  for (let i = messages.length - 1; i >= 0; i--) {
    const candidate = [messages[i]!, ...result];
    if (countMessageTokens(candidate, model) > CONTEXT_BUDGET) break;
    result.unshift(messages[i]!);
  }
  return result;
}

Detection

ID: conversation-history-truncation
Severity: critical
What to look for: In the chat API route (wherever openai.chat.completions.create, streamText, or generateText is called with a messages array), look for logic that limits what is sent. Common patterns include messages.slice(-N), a loop that removes the oldest messages when a token budget is exceeded, or a helper function named something like trimHistory, truncateMessages, or buildContextWindow. Also check whether the full messages array from a database or client state is passed directly with no size guard. Count all instances found and enumerate each.
Pass criteria: There is explicit logic to truncate, window, or filter older messages from the conversation history before the AI call. The messages array sent to the provider is bounded. At least 1 implementation must be confirmed.
Fail criteria: The entire messages array from the client or database is passed directly to the AI provider without any length check or truncation. No guard exists between accumulated history and the outbound API call.
Skip (N/A) when: The application is demonstrably single-turn (stateless) and does not persist or pass prior conversation turns. Signal: API calls use a single prompt string rather than a messages array, or there is no mechanism in the UI to continue a conversation.
Detail on fail: "Full conversation history passed to API with no truncation — sessions will break as history grows"

Remediation: Passing unbounded conversation history guarantees eventual API failure once the context limit is reached. The failure happens mid-session with no graceful fallback.

Implement a sliding window approach, bounded by token count rather than message count:

// src/lib/ai/context-manager.ts
import { countMessageTokens } from "./token-counter";

const CONTEXT_BUDGET = 6000; // reserve headroom for system prompt + response

export function buildContextWindow(
  messages: Array<{ role: string; content: string }>,
  model: string
): Array<{ role: string; content: string }> {
  const result: typeof messages = [];
  // iterate from newest to oldest, filling budget
  for (let i = messages.length - 1; i >= 0; i--) {
    const candidate = [messages[i]!, ...result];
    if (countMessageTokens(candidate, model) > CONTEXT_BUDGET) break;
    result.unshift(messages[i]!);
  }
  return result;
}

Then in the route handler:

const boundedMessages = buildContextWindow(allMessages, model);
const result = await streamText({ model: openai(model), messages: boundedMessages });

Verify by sending more than 20 back-and-forth turns; the API payload (visible in server logs) should never grow beyond the budget.

External references

owasp-llm:2025 · LLM10 — Unbounded Consumption

cwe · CWE-770 — Allocation of Resources Without Limits or Throttling

nist-ai-rmf:1.0 · MANAGE 2.2 — Mechanisms to sustain AI risk management are in place

iso-25010:2011 · performance-efficiency.capacity — Capacity — bounded conversation history within context window