Truncated responses are detected and handled gracefully

ab-000201 · ai-response-quality.response-formatting.truncation-handling

Severity: lowactive

Why it matters

When finish_reason: "length" fires and the application ignores it, users receive half-written code, truncated JSON that fails to parse, or advice that ends mid-clause — and they have no signal that anything was cut. They act on incomplete information, file bug reports against the wrong component, or lose trust in the assistant entirely. The error-resilience taxon requires surfacing partial-output states; silent truncation is the worst possible UX for a recoverable failure because it looks like a complete answer.

Severity rationale

Low because truncation is infrequent in practice and users can retry, but the failure mode is silent and misleading.

Remediation

After every non-streaming API call, inspect finish_reason (OpenAI) or stop_reason (Anthropic) and propagate a truncation flag to the UI so the user sees a "Response may be incomplete" banner with a continue button. Wire this through api/chat/route.ts.

const choice = response.choices[0]
return { content: choice.message.content, truncated: choice.finish_reason === 'length' }

Detection

ID: truncation-handling
Severity: low
What to look for: Enumerate all relevant files and Check API call sites for handling of the finish_reason field (OpenAI/Anthropic APIs return finish_reason: "length" when a response is cut short by max_tokens). Look for code that reads response.choices[0].finish_reason or response.stop_reason and branches on it. Check whether the UI surfaces any indication to the user when a response was truncated.
Pass criteria: At least 1 implementation must be present. The application checks finish_reason/stop_reason after receiving a response. When the value is "length" (or equivalent), the application either prompts for continuation or displays a notice to the user indicating the response was cut short.
Fail criteria: The application never checks finish_reason — truncated responses are silently delivered to users as if complete.
Skip (N/A) when: Application uses streaming with no max_tokens limit, making truncation impossible.
Detail on fail: "finish_reason not checked after API call in api/chat/route.ts — truncated responses delivered silently" (max 500 chars)

Remediation: Check the finish reason after every non-streaming API call:

const response = await openai.chat.completions.create({ ... })
const choice = response.choices[0]

if (choice.finish_reason === 'length') {
  return { content: choice.message.content, truncated: true }
}

return { content: choice.message.content, truncated: false }

Surface truncated: true in the UI with a "Response may be incomplete" notice.

Taxons

error-resilience inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-response-quality·automated

Why it matters

Remediation

const choice = response.choices[0]
return { content: choice.message.content, truncated: choice.finish_reason === 'length' }

Detection

ID: truncation-handling
Severity: low
What to look for: Enumerate all relevant files and Check API call sites for handling of the finish_reason field (OpenAI/Anthropic APIs return finish_reason: "length" when a response is cut short by max_tokens). Look for code that reads response.choices[0].finish_reason or response.stop_reason and branches on it. Check whether the UI surfaces any indication to the user when a response was truncated.
Pass criteria: At least 1 implementation must be present. The application checks finish_reason/stop_reason after receiving a response. When the value is "length" (or equivalent), the application either prompts for continuation or displays a notice to the user indicating the response was cut short.
Fail criteria: The application never checks finish_reason — truncated responses are silently delivered to users as if complete.
Skip (N/A) when: Application uses streaming with no max_tokens limit, making truncation impossible.
Detail on fail: "finish_reason not checked after API call in api/chat/route.ts — truncated responses delivered silently" (max 500 chars)

Remediation: Check the finish reason after every non-streaming API call:

const response = await openai.chat.completions.create({ ... })
const choice = response.choices[0]

if (choice.finish_reason === 'length') {
  return { content: choice.message.content, truncated: true }
}

return { content: choice.message.content, truncated: false }

Surface truncated: true in the UI with a "Response may be incomplete" notice.