Skip to main content

Truncated responses are detected and handled gracefully

ab-000201 · ai-response-quality.response-formatting.truncation-handling
Severity: lowactive

Why it matters

When finish_reason: "length" fires and the application ignores it, users receive half-written code, truncated JSON that fails to parse, or advice that ends mid-clause — and they have no signal that anything was cut. They act on incomplete information, file bug reports against the wrong component, or lose trust in the assistant entirely. The error-resilience taxon requires surfacing partial-output states; silent truncation is the worst possible UX for a recoverable failure because it looks like a complete answer.

Severity rationale

Low because truncation is infrequent in practice and users can retry, but the failure mode is silent and misleading.

Remediation

After every non-streaming API call, inspect finish_reason (OpenAI) or stop_reason (Anthropic) and propagate a truncation flag to the UI so the user sees a "Response may be incomplete" banner with a continue button. Wire this through api/chat/route.ts.

const choice = response.choices[0]
return { content: choice.message.content, truncated: choice.finish_reason === 'length' }

Detection

  • ID: ai-response-quality.response-formatting.truncation-handling

  • Severity: low

  • What to look for: Enumerate all relevant files and Check API call sites for handling of the finish_reason field (OpenAI/Anthropic APIs return finish_reason: "length" when a response is cut short by max_tokens). Look for code that reads response.choices[0].finish_reason or response.stop_reason and branches on it. Check whether the UI surfaces any indication to the user when a response was truncated.

  • Pass criteria: At least 1 implementation must be present. The application checks finish_reason/stop_reason after receiving a response. When the value is "length" (or equivalent), the application either prompts for continuation or displays a notice to the user indicating the response was cut short.

  • Fail criteria: The application never checks finish_reason — truncated responses are silently delivered to users as if complete.

  • Skip (N/A) when: Application uses streaming with no max_tokens limit, making truncation impossible.

  • Detail on fail: "finish_reason not checked after API call in api/chat/route.ts — truncated responses delivered silently" (max 500 chars)

  • Remediation: Check the finish reason after every non-streaming API call:

    const response = await openai.chat.completions.create({ ... })
    const choice = response.choices[0]
    
    if (choice.finish_reason === 'length') {
      return { content: choice.message.content, truncated: true }
    }
    
    return { content: choice.message.content, truncated: false }
    

    Surface truncated: true in the UI with a "Response may be incomplete" notice.


Taxons

History