RAG retrieval sources are passed through to the model context

ab-000205 · ai-response-quality.source-attribution.rag-source-passthrough

Severity: mediumactive

Why it matters

A RAG pipeline that retrieves documents but fails to inject them into the model context — or injects them as unstructured blob text — is just an expensive vector search that the LLM cannot use. The model falls back to parametric knowledge, hallucinates confidently on domain-specific questions, and cites sources it never saw. This breaks the inference-contract taxon: you promised grounded answers, you delivered confabulation. Users trust the retrieved citations exactly when they are least reliable.

Severity rationale

Medium because RAG hallucinations bypass the safety layer users believe they are paying for.

Remediation

Wrap each retrieved chunk in labeled delimiters before injection, and instruct the model to answer only from the provided context and acknowledge insufficient context explicitly. Apply this at the retrieval boundary in lib/rag.ts.

const contextBlock = docs.map((d, i) => `<source id="${i+1}" title="${d.metadata.title}">\n${d.pageContent}\n</source>`).join('\n')
const userMessage = `Answer from the sources below. If insufficient, say so.\n\n${contextBlock}\n\nQuestion: ${q}`

Detection

ID: rag-source-passthrough
Severity: medium
What to look for: Enumerate all relevant files and If a RAG pipeline is detected (vector store dependencies: pgvector, pinecone, weaviate, chroma, qdrant, supabase with pgvector, langchain retriever, llamaindex, etc.), examine whether retrieved documents are injected into the model's context (user message, system message, or context window). Check whether the retrieved content is structured with clear delimiters (e.g., wrapped in XML-like tags, numbered, or labeled) that help the model attribute answers to specific sources. Check whether there is any logic to verify that the model's answer actually references the retrieved context.
Pass criteria: No more than 0 violations are acceptable. RAG pipeline injects retrieved chunks into the model context with clear source labeling. The system prompt instructs the model to answer from the provided context and to acknowledge when context is insufficient.
Fail criteria: RAG pipeline retrieves documents but either (a) does not inject them into the model's context window, or (b) injects raw text with no structure or labeling, making it impossible for the model to identify which source a fact comes from.
Skip (N/A) when: No vector store or retrieval library detected in dependencies.
Detail on fail: "Retrieved chunks in lib/rag.ts injected as plain concatenated text with no source labels or delimiters" (max 500 chars)

Remediation: Structure retrieved context clearly:

const contextBlock = retrievedDocs.map((doc, i) =>
  `<source id="${i + 1}" title="${doc.metadata.title}">\n${doc.pageContent}\n</source>`
).join('\n')

const userMessageWithContext = `
Use the following sources to answer the question:

${contextBlock}

Question: ${userQuestion}
`

External references

owasp-llm:2025 · LLM09 — Misinformation

Taxons

inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-response-quality·automated

Why it matters

Remediation

const contextBlock = docs.map((d, i) => `<source id="${i+1}" title="${d.metadata.title}">\n${d.pageContent}\n</source>`).join('\n')
const userMessage = `Answer from the sources below. If insufficient, say so.\n\n${contextBlock}\n\nQuestion: ${q}`

Detection

ID: rag-source-passthrough
Severity: medium
What to look for: Enumerate all relevant files and If a RAG pipeline is detected (vector store dependencies: pgvector, pinecone, weaviate, chroma, qdrant, supabase with pgvector, langchain retriever, llamaindex, etc.), examine whether retrieved documents are injected into the model's context (user message, system message, or context window). Check whether the retrieved content is structured with clear delimiters (e.g., wrapped in XML-like tags, numbered, or labeled) that help the model attribute answers to specific sources. Check whether there is any logic to verify that the model's answer actually references the retrieved context.
Pass criteria: No more than 0 violations are acceptable. RAG pipeline injects retrieved chunks into the model context with clear source labeling. The system prompt instructs the model to answer from the provided context and to acknowledge when context is insufficient.
Fail criteria: RAG pipeline retrieves documents but either (a) does not inject them into the model's context window, or (b) injects raw text with no structure or labeling, making it impossible for the model to identify which source a fact comes from.
Skip (N/A) when: No vector store or retrieval library detected in dependencies.
Detail on fail: "Retrieved chunks in lib/rag.ts injected as plain concatenated text with no source labels or delimiters" (max 500 chars)

Remediation: Structure retrieved context clearly:

const contextBlock = retrievedDocs.map((doc, i) =>
  `<source id="${i + 1}" title="${doc.metadata.title}">\n${doc.pageContent}\n</source>`
).join('\n')

const userMessageWithContext = `
Use the following sources to answer the question:

${contextBlock}

Question: ${userQuestion}
`