System prompt includes PII safety instructions

ab-000163 · ai-data-privacy.pii-protection.system-prompt-pii-safeguard

Severity: lowactive

Why it matters

OWASP LLM01 (Prompt Injection) attackers often use injected instructions to extract or reproduce PII that has been provided in the conversation context. Adding behavioral guardrails to the system prompt — "do not repeat personal information the user shares" — reduces the surface area for accidental PII regurgitation. NIST AI RMF MANAGE 2.2 calls for implementing controls to minimize AI-generated harm. While system prompt instructions are not a technical security control and can be overridden by adversarial prompts, they meaningfully reduce accidental PII generation from normal interactions and demonstrate a documented intent to protect user data.

Severity rationale

Low because system prompt instructions are a behavioral rather than technical control, but their absence leaves the AI with no documented guidance on PII handling — increasing accidental disclosure risk in normal usage.

Remediation

Add explicit PII safety instructions to your system prompt constant in lib/ai/ or wherever your system prompt is defined.

// lib/ai/system-prompt.ts
export const SYSTEM_PROMPT = `You are a helpful assistant for [App Name].

Privacy guidelines:
- Do not ask users for personal information (email, phone, address, SSN, etc.)
- If a user shares personal information, do not repeat it back or include it in your response
- If asked to generate personal information about real individuals, decline politely
- Focus responses on the task without referencing identifying details the user may have shared`

Store the system prompt as a versioned constant so changes are tracked in version control. For higher-risk applications, add domain-specific prohibitions (e.g., "Do not provide medical diagnoses" or "Do not discuss specific account balances").

Detection

ID: system-prompt-pii-safeguard
Severity: low
What to look for: Enumerate every relevant item. Locate system prompt definitions in the codebase — constants or configuration objects that define the system role message or instruction string passed to the AI. Look in lib/ai/, constants/, config/, or inline in route handlers. Examine the system prompt text for instructions related to PII handling: directives like "Do not generate or repeat personal information", "If the user shares sensitive data like email or phone, do not include it in your response", "Do not ask users for personal information", or similar safety boundaries.
Pass criteria: At least 1 of the following conditions is met. The system prompt includes at least one directive instructing the AI to avoid generating, repeating, or soliciting personally identifiable information.
Fail criteria: A system prompt exists but contains no instructions about PII handling — the AI has no explicit guidance about how to handle personal data it might encounter.
Skip (N/A) when: No system prompt is used (the application sends only user messages without a system instruction). Or the AI is used with a provider that does not support system prompts.
Detail on fail: "System prompt in [file] contains no PII safety instructions — the AI has no explicit guidance to avoid generating or repeating personal data"

Remediation: System prompts are your opportunity to establish behavioral guardrails. While they are not a technical security control (a determined adversary can prompt-inject around them), they meaningfully reduce accidental PII generation.

Add PII safety instructions to your system prompt:

const SYSTEM_PROMPT = `You are a helpful assistant for [App Name].

Privacy guidelines:
- Do not ask users for personal information (email, phone, address, etc.)
- If a user shares personal information, do not repeat it back or include it in your response
- If asked to generate personal information about real individuals, decline politely
- Focus responses on the task at hand without referencing any identifying details the user may have shared`

For a deeper review of prompt injection and system prompt bypass techniques, the AI Prompt Injection Audit covers adversarial system prompt attacks in detail.

External references

owasp-llm:2025 · LLM01 — Prompt Injection — system prompt guardrails
nist-ai-rmf:1.0 · MANAGE 2.2 — Mechanisms for managing AI risks including behavioural guardrails

Taxons

inference-contract privacy-consent

History

2026-04-18·v1.0.0·Initial import from ai-data-privacy·automated