System prompt is not exposed in API responses

ab-000182 · ai-prompt-injection.system-prompt-protection.no-system-prompt-in-response

Severity: criticalactive

Why it matters

Returning the raw AI provider response object to the client exposes your system prompt—the complete set of instructions governing your AI's behavior—to anyone who inspects the network response. OWASP LLM01:2025 lists system prompt disclosure as a direct attack enabler: once an attacker knows your exact instructions, they can craft targeted overrides, identify specific restrictions to circumvent, and reverse-engineer your application's business logic. MITRE ATLAS AML.T0051 classifies this as reconnaissance that precedes targeted injection. CWE-200 covers the broader information exposure class. In some AI SDK versions, the response object echoes the full messages array including the system role—returning completion directly rather than completion.choices[0].message.content leaks this data to every user.

Severity rationale

Critical because system prompt disclosure gives attackers the exact blueprint needed to craft targeted injection attacks, collapsing the attacker's effort from exploratory probing to direct exploitation.

Remediation

Extract only the assistant's reply content from the completion object. Return nothing else to the client.

// src/app/api/chat/route.ts
const completion = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages // system prompt is in here — never send this array to the client
})

// Return only the assistant message text
const assistantReply = completion.choices[0]?.message?.content ?? ''
return Response.json({ message: assistantReply })

// Do NOT return: completion, messages, or any field that includes the system prompt

Also audit your error handling (see the no-prompt-in-errors check)—a catch block that returns error.response.data from the OpenAI SDK can echo the request payload including the system message.

Detection

ID: no-system-prompt-in-response
Severity: critical
What to look for: Count all LLM response paths returned to users. For each, examine all API routes that call AI providers. Check whether the raw AI provider response object (which may contain the full request echo including the system prompt, depending on the provider and SDK version) is returned directly to the client. Also check whether the system prompt string is included in any response payload for debugging purposes. Look for JSON.stringify(response), returning the full completion object, or logging the messages array to the client.
Pass criteria: The system prompt is never returned in any HTTP response. The client receives only the model's reply content (the assistant message text, or a structured subset of the completion object), not the full API response or the original messages array — 0% of responses should contain system prompt text. Report: "X LLM response paths found, all Y filter system prompt content."
Fail criteria: The full AI provider response object (which echoes the system prompt in some SDKs), or the messages array including the system message, is returned to the client. Or the system prompt string is explicitly included in a response field.
Skip (N/A) when: No AI provider integration detected.
Do NOT pass when: The system prompt is partially exposed through error messages or debug output, even if the main response path is clean.
Detail on fail: "POST /api/chat returns the full OpenAI completion object including choices[0].message — the system prompt may be echoed in the response" or "API route returns the full messages array including the system role message to the client"
Remediation: Exposing your system prompt tells attackers exactly which instructions to override and how your application works. Return only what the client needs:
```
const completion = await openai.chat.completions.create({ ... })
const assistantMessage = completion.choices[0]?.message?.content

// Return only the assistant's reply — nothing else
return Response.json({ message: assistantMessage })
```
For a deeper analysis of information exposure patterns, the Security Headers & Basics Audit covers server-side information leakage.

External references

cwe · CWE-200 — Exposure of Sensitive Information to an Unauthorized Actor
owasp-llm:2025 · LLM01 — Prompt Injection
mitre-atlas:v4 · AML.T0051 — LLM Prompt Injection

Taxons

inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-prompt-injection·automated

Why it matters

Remediation

Extract only the assistant's reply content from the completion object. Return nothing else to the client.

// src/app/api/chat/route.ts
const completion = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages // system prompt is in here — never send this array to the client
})

// Return only the assistant message text
const assistantReply = completion.choices[0]?.message?.content ?? ''
return Response.json({ message: assistantReply })

// Do NOT return: completion, messages, or any field that includes the system prompt

Also audit your error handling (see the no-prompt-in-errors check)—a catch block that returns error.response.data from the OpenAI SDK can echo the request payload including the system message.

Detection

ID: no-system-prompt-in-response
Severity: critical
What to look for: Count all LLM response paths returned to users. For each, examine all API routes that call AI providers. Check whether the raw AI provider response object (which may contain the full request echo including the system prompt, depending on the provider and SDK version) is returned directly to the client. Also check whether the system prompt string is included in any response payload for debugging purposes. Look for JSON.stringify(response), returning the full completion object, or logging the messages array to the client.
Pass criteria: The system prompt is never returned in any HTTP response. The client receives only the model's reply content (the assistant message text, or a structured subset of the completion object), not the full API response or the original messages array — 0% of responses should contain system prompt text. Report: "X LLM response paths found, all Y filter system prompt content."
Fail criteria: The full AI provider response object (which echoes the system prompt in some SDKs), or the messages array including the system message, is returned to the client. Or the system prompt string is explicitly included in a response field.
Skip (N/A) when: No AI provider integration detected.
Do NOT pass when: The system prompt is partially exposed through error messages or debug output, even if the main response path is clean.
Detail on fail: "POST /api/chat returns the full OpenAI completion object including choices[0].message — the system prompt may be echoed in the response" or "API route returns the full messages array including the system role message to the client"
Remediation: Exposing your system prompt tells attackers exactly which instructions to override and how your application works. Return only what the client needs:
```
const completion = await openai.chat.completions.create({ ... })
const assistantMessage = completion.choices[0]?.message?.content

// Return only the assistant's reply — nothing else
return Response.json({ message: assistantMessage })
```
For a deeper analysis of information exposure patterns, the Security Headers & Basics Audit covers server-side information leakage.