High temperature settings (≥0.8) combined with no factuality constraints in the system prompt create a confabulation-maximizing configuration. For research, knowledge, or information retrieval applications, this means the model will confidently generate plausible-but-invented statistics, misattributed quotes, and false technical specifications. OWASP LLM09 directly covers this failure mode. Temperature controls randomness in token sampling; at high values, the model increasingly selects lower-probability tokens — raising the rate of semantically coherent but factually incorrect output in a way that is invisible to users.
Medium because high temperature alone does not guarantee confabulation, but combined with no factuality system prompt instruction it materially and measurably increases hallucination rate in production queries.
Lower the temperature to 0.2 or below for factual-domain applications and add an explicit factuality constraint:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0.2, // Was: 1.0
messages: [
{
role: 'system',
content: 'Answer factually and precisely. Do not speculate or extrapolate beyond what you know with confidence.'
},
{ role: 'user', content: userMessage }
]
})
For code generation or creative tasks, higher temperatures are appropriate — apply the constraint selectively to routes that answer factual queries.
ID: ai-response-quality.hallucination-prevention.factual-claim-boundedness
Severity: medium
What to look for: Enumerate all relevant files and Check the temperature and sampling parameters used in API calls. High temperature (>0.7) in factual applications increases hallucination risk. Check whether the system prompt restricts the AI to factual, verifiable claims vs. allowing speculation. Look for instructions like "Do not speculate beyond what is known", "Answer factually", "Stick to what is in the provided context". For numeric or statistical claims, check whether the application has any post-processing to validate ranges or plausibility.
Pass criteria: No more than 0 violations are acceptable. API calls in factual domains use temperature at or below 0.5. System prompt contains explicit factuality constraints. No unbounded speculation instructions.
Fail criteria: Factual application uses temperature of 0.8 or above with no factuality constraints in the system prompt, significantly increasing confabulation risk.
Skip (N/A) when: Application is a creative writing tool, story generator, or brainstorming assistant where imagination is the intended behavior.
Detail on fail: "Temperature set to 1.0 with no factuality constraint in api/research/route.ts — high confabulation risk for factual queries" (max 500 chars)
Remediation: Lower temperature and add factuality constraints for knowledge-domain applications:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
temperature: 0.2,
messages: [
{
role: 'system',
content: 'Answer factually and precisely. Do not speculate or extrapolate beyond what you know with confidence.'
},
{ role: 'user', content: userMessage }
]
})