External knowledge claims include appropriate attribution cues

ab-000203 · ai-response-quality.source-attribution.external-claim-citations

Severity: highactive

Why it matters

RAG pipelines that retrieve document chunks but strip source metadata before passing them to the model make it impossible for the AI to correctly attribute its answers. Users reading the response have no way to verify claims, locate original documents, or identify when the model has drawn on training data instead of retrieved content. For regulated industries — legal, healthcare, financial — unattributed claims create liability and compliance risk. OWASP LLM09 classifies unattributed AI-generated content as a misinformation risk. Without attribution cues, the inference-contract between the retrieval system and the user is broken.

Severity rationale

High because unattributed AI responses in knowledge-domain applications undermine user ability to verify claims and expose the application to liability when users act on unverifiable information.

Remediation

Pass source metadata alongside retrieved chunk content and instruct the model to reference it:

const contextBlock = chunks.map(c =>
  `[Source: ${c.documentTitle}, p.${c.pageNumber}]\n${c.content}`
).join('\n\n')

const systemPrompt = `
Answer using only the provided context. When citing information,
reference the source document by name: "According to [Document Name]..."
`

For non-RAG applications, instruct the model to qualify training-data claims with "Based on my training data" so users can distinguish model knowledge from retrieved evidence.

Detection

ID: external-claim-citations
Severity: high
What to look for: Enumerate all relevant files and In RAG-enabled applications, check whether the retrieval pipeline passes source metadata (document title, page, date, URL) alongside the retrieved chunks to the AI model. Examine the system prompt for instructions directing the AI to reference its source context when making factual claims (e.g., "When answering from provided documents, cite the document name"). In non-RAG applications, check whether the system prompt instructs the AI to qualify claims that originate from its training data rather than user-provided context.
Pass criteria: At least 1 conforming pattern must exist. Either (a) RAG pipeline passes source metadata and the system prompt instructs the AI to cite it, or (b) the system prompt instructs the AI to qualify training-data claims with appropriate hedging ("Based on my training data...", "As of my knowledge cutoff...").
Fail criteria: RAG application retrieves context but does not pass source metadata to the model, or the system prompt contains no attribution instructions for any knowledge source.
Skip (N/A) when: Application is a personal assistant or code editor with no factual knowledge domain where source attribution would be meaningless.
Detail on fail: "RAG pipeline in lib/retrieval.ts passes chunk text but not source metadata — model cannot cite document origins" (max 500 chars)

Remediation: Include source metadata in the context passed to the model:

const contextBlock = retrievedChunks.map(chunk =>
  `[Source: ${chunk.documentTitle}, p.${chunk.pageNumber}]\n${chunk.content}`
).join('\n\n')

const systemPrompt = `
Answer questions using the provided context. When you cite information from the
context, reference the source document by name (e.g., "According to [Document Name]...").
`

External references

owasp-llm:2025 · LLM09 — Misinformation

Taxons

inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-response-quality·automated