Responses are consistent for semantically equivalent queries

ab-000214 · ai-response-quality.response-management.response-consistency

Severity: lowactive

Why it matters

Temperature above 0.8 on a consistency-sensitive endpoint means the same billing question gets three different answers on three retries — one correct, one subtly wrong, one contradicting the docs. For customer support, legal, or documentation Q&A, this breaks the inference-contract taxon: users expect the same question to yield the same answer, and variance here manifests as product bugs, escalated tickets, and compliance incidents. Aggressive sampling is a feature for creative writing and a defect everywhere else.

Severity rationale

Low because variance is tolerable in many contexts, but becomes serious for regulated or support-critical surfaces.

Remediation

Lower temperature to 0.3 or below for consistency-sensitive routes and set a deterministic seed where the provider supports it. Add consistency directives to the system prompt. Update the config in api/support/route.ts.

await openai.chat.completions.create({
  model: 'gpt-4o',
  temperature: 0.3,
  seed: 42,
  messages: [...]
})

Detection

ID: response-consistency
Severity: low
What to look for: Enumerate all relevant files and Check the temperature setting on AI API calls. Temperature above 0.7 produces highly variable responses for the same query. For applications where consistency matters (legal, medical, customer support, documentation), check whether the system prompt includes consistency instructions ("Respond consistently", "Use the same terminology across responses"). Check whether a deterministic seed is set (OpenAI seed parameter, where available).
Pass criteria: No more than 0 violations are acceptable. For consistency-sensitive applications, temperature is at or below 0.5, or a seed is configured, or the system prompt includes consistency instructions. For creative applications, high temperature is acceptable.
Fail criteria: Consistency-sensitive application (customer support, documentation Q&A, legal assistant) uses temperature above 0.8 with no consistency instructions.
Skip (N/A) when: Application is a creative assistant, story generator, or any tool where response variety is a feature.
Detail on fail: "Customer support bot in api/support/route.ts uses temperature=1.0 — responses to the same question will vary significantly" (max 500 chars)

Remediation: Lower temperature and/or add a seed for consistency-sensitive use cases:

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  temperature: 0.3,
  seed: 42,
  messages: [...]
})

Taxons

inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-response-quality·automated

Why it matters

Remediation

await openai.chat.completions.create({
  model: 'gpt-4o',
  temperature: 0.3,
  seed: 42,
  messages: [...]
})

Detection

ID: response-consistency
Severity: low
What to look for: Enumerate all relevant files and Check the temperature setting on AI API calls. Temperature above 0.7 produces highly variable responses for the same query. For applications where consistency matters (legal, medical, customer support, documentation), check whether the system prompt includes consistency instructions ("Respond consistently", "Use the same terminology across responses"). Check whether a deterministic seed is set (OpenAI seed parameter, where available).
Pass criteria: No more than 0 violations are acceptable. For consistency-sensitive applications, temperature is at or below 0.5, or a seed is configured, or the system prompt includes consistency instructions. For creative applications, high temperature is acceptable.
Fail criteria: Consistency-sensitive application (customer support, documentation Q&A, legal assistant) uses temperature above 0.8 with no consistency instructions.
Skip (N/A) when: Application is a creative assistant, story generator, or any tool where response variety is a feature.
Detail on fail: "Customer support bot in api/support/route.ts uses temperature=1.0 — responses to the same question will vary significantly" (max 500 chars)

Remediation: Lower temperature and/or add a seed for consistency-sensitive use cases:

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  temperature: 0.3,
  seed: 42,
  messages: [...]
})