Skip to main content

Response length is proportional to query complexity

ab-000200 · ai-response-quality.response-formatting.response-length-proportionality
Severity: mediumactive

Why it matters

Unbounded max_tokens with no length guidance produces two failure modes at once: the model either pads simple answers into three-paragraph essays (burning tokens against the cost-efficiency taxon) or gets cut off mid-sentence on complex queries because the default context is exhausted. Users wait longer, pay more per request, and read padding that obscures the answer. At scale, a single missing max_tokens parameter can triple your OpenAI bill without anyone noticing until the invoice.

Severity rationale

Medium because the impact is financial and UX degradation rather than security or data loss, but recurs on every request.

Remediation

Set a context-appropriate max_tokens ceiling and add length calibration to the system prompt so the model adapts output length to query complexity. Apply this at every call site in lib/ai/chat.ts.

await openai.chat.completions.create({
  model: 'gpt-4o',
  max_tokens: 1024,
  messages: [{ role: 'system', content: 'Answer concisely. Simple questions: 1-2 sentences. Complex topics: structured paragraphs. Never pad.' }, ...]
})

Detection

  • ID: ai-response-quality.response-formatting.response-length-proportionality

  • Severity: medium

  • What to look for: Enumerate all relevant files and Check whether the system prompt or API parameters include length guidance — max_tokens limits, explicit length instructions ("Be concise", "Answer in one sentence for simple questions", "Use a paragraph for complex topics"), or dynamic length configuration based on query type. Check for hardcoded max_tokens values that are either very low (truncating legitimate responses) or absent (allowing unbounded verbosity).

  • Pass criteria: At least 1 conforming pattern must exist. The application either (a) sets context-appropriate max_tokens limits, (b) includes length calibration instructions in the system prompt, or (c) uses a model with inherent length calibration and no conflicting instructions that force verbosity. Report the count of conforming instances found even on pass.

  • Fail criteria: No max_tokens limit is set AND no length instructions exist in the system prompt, leaving response length entirely unconstrained. Or max_tokens is set so low (e.g., 50 tokens for a general assistant) that legitimate responses are routinely truncated.

  • Skip (N/A) when: Application uses streaming with user-initiated stop controls, where users themselves control response length.

  • Detail on fail: "No max_tokens set and no length instruction in system prompt in lib/ai/chat.ts — responses may be arbitrarily long" (max 500 chars)

  • Remediation: Add length guidance appropriate to your use case:

    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      max_tokens: 1024,
      messages: [
        {
          role: 'system',
          content: `Answer concisely. For simple questions, use 1-2 sentences.
                    For complex topics, use structured paragraphs. Never pad responses.`
        },
        { role: 'user', content: userMessage }
      ]
    })
    

Taxons

History