A context compaction or summarization strategy is implemented for long sessions

ab-000313 · ai-token-optimization.context-management.context-compaction-strategy

Severity: mediumactive

Why it matters

Simple FIFO truncation discards messages with no regard for their semantic value. The user's name stated at message 1, a constraint given at message 5, or a decision made at message 10 vanishes permanently when it slides out of the window. In multi-step workflows — job application assistants, planning tools, tutoring systems — this causes visible regression: the AI contradicts itself or asks questions the user already answered. NIST AI RMF MANAGE 2.2 requires managing model limitations that affect outcomes; history amnesia is a direct consequence of ignoring this.

Severity rationale

Medium because hard truncation degrades conversation quality and user trust in extended sessions, without causing an outright crash.

Remediation

Implement a summarization step that fires when history exceeds a turn threshold. Use a cheap, fast model for the summarization call — not the frontier model you use for the primary task.

// src/lib/ai/compaction.ts
export async function compactHistory(
  messages: Array<{ role: string; content: string }>,
  threshold: number = 20
): Promise<Array<{ role: string; content: string }>> {
  if (messages.length <= threshold) return messages;

  const toSummarize = messages.slice(0, messages.length - 10);
  const recentMessages = messages.slice(messages.length - 10);

  const { text: summary } = await generateText({
    model: openai("gpt-4o-mini"),
    prompt: `Summarize this conversation, preserving key facts, names, goals, and decisions:\n\n${
      toSummarize.map(m => `${m.role}: ${m.content}`).join("\n")
    }`,
    maxTokens: 300,
  });

  return [
    { role: "system", content: `[Earlier conversation summary]: ${summary}` },
    ...recentMessages,
  ];
}

Verify by stating a specific fact at message 1, continuing for 25+ turns, then asking the AI to recall that fact — it should still know it.

Detection

ID: context-compaction-strategy
Severity: medium
What to look for: Look for advanced history management beyond simple truncation. Search for "summarization" steps where an LLM call summarizes older messages before they are dropped. Look for utility functions named something like summarizeHistory, compactContext, generateMemory, or references to "rolling summary" patterns. Also look for integration with external memory services or vector-based memory retrieval. Count all instances found and enumerate each.
Pass criteria: For applications supporting extended conversations (more than ~20 turns), there is either a summarization step that preserves semantic content from dropped messages or a vector memory system that recalls key facts from earlier in the session. At least 1 implementation must be confirmed.
Fail criteria: Old messages are removed via simple FIFO truncation with no attempt to preserve key context — names, goals, constraints, or facts stated early in the session are permanently lost when truncated.
Skip (N/A) when: The application is designed only for short, transactional interactions (e.g., single-question lookups, one-shot summarization tools) where long multi-turn context is not a use case. Signal: No conversation history persistence, no database table storing message turns, UI resets on each page load.
Detail on fail: "History is hard-truncated with no summarization — users lose early context in long sessions"

Remediation: Hard truncation makes the AI "forget" the user's name, goals, or constraints from the start of the session. In a multi-step workflow, this causes regression and confusion.

Implement a background summarization step that runs when history exceeds a threshold:

// src/lib/ai/compaction.ts
export async function compactHistory(
  messages: Array<{ role: string; content: string }>,
  model: string,
  threshold: number = 20
): Promise<Array<{ role: string; content: string }>> {
  if (messages.length <= threshold) return messages;

  const toSummarize = messages.slice(0, messages.length - 10);
  const recentMessages = messages.slice(messages.length - 10);

  const { text: summary } = await generateText({
    model: openai("gpt-4o-mini"), // use a cheap model for summarization
    prompt: `Summarize the following conversation, preserving key facts, names, goals, and decisions:\n\n${
      toSummarize.map(m => `${m.role}: ${m.content}`).join("\n")
    }`,
    maxTokens: 300,
  });

  return [
    { role: "system", content: `[Earlier conversation summary]: ${summary}` },
    ...recentMessages,
  ];
}

Verify by having a conversation where you state a fact at message 1, then chat for 25+ turns and ask for that fact — the AI should still know it.

External references

owasp-llm:2025 · LLM10 — Unbounded Consumption
nist-ai-rmf:1.0 · MANAGE 2.2 — Mechanisms to sustain AI risk management are in place
iso-25010:2011 · performance-efficiency.capacity — Capacity — summarization preserves context within window limits

Taxons

cost-efficiency inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated