User-controllable system prompt modification is prevented

ab-000184 · ai-prompt-injection.system-prompt-protection.no-user-controlled-system-prompt

Severity: criticalactive

Why it matters

Allowing users to modify the system prompt—whether through a "personality" slider, a custom persona field, or a request parameter—is architecturally equivalent to giving them write access to your application's security policy. OWASP LLM01:2025 and MITRE ATLAS AML.T0054 both identify this as a direct vector: the attacker simply submits their injection payload as a legitimate feature input rather than smuggling it through the user message. CWE-1427 captures this as uncontrolled influence over the inference contract. Personalization features that inject user text into the system message have caused production incidents where users granted themselves admin access, extracted other users' data, or caused the model to generate policy-violating content—all through the intended UI.

Severity rationale

Critical because user-controlled system prompt modification requires zero exploitation sophistication—the attacker uses the application's own feature to deliver the injection payload through a fully authorized request.

Remediation

Gate all system prompt selection through a server-side allowlist of pre-approved, developer-authored variants. No user-provided text enters the system message under any condition.

// src/lib/prompts.ts
const PROMPT_VARIANTS = {
  assistant: ASSISTANT_SYSTEM_PROMPT,
  analyst: ANALYST_SYSTEM_PROMPT,
  support: SUPPORT_SYSTEM_PROMPT,
} as const

type PromptKey = keyof typeof PROMPT_VARIANTS

export function resolveSystemPrompt(key: unknown): string {
  if (typeof key === 'string' && key in PROMPT_VARIANTS) {
    return PROMPT_VARIANTS[key as PromptKey]
  }
  return PROMPT_VARIANTS.assistant // safe default
}

If your product requires user-customizable AI behavior, implement it as a named slot within a fixed template ("Respond in the user's preferred language: {language}"), where the slot value is validated and sanitized before insertion—never as free-text system prompt replacement.

Detection

ID: no-user-controlled-system-prompt
Severity: critical
What to look for: Enumerate every system prompt construction path. For each, look for any mechanism where user-provided data modifies, replaces, or appends to the system prompt dynamically based on user preferences or input. Check for: user-configurable "AI personality" features that inject user text into the system message, admin panels that allow arbitrary system prompt editing without sanitization, or request parameters (query strings, body fields) that are used to select or modify the system prompt at request time.
Pass criteria: The system prompt is entirely developer-controlled. No user-provided data modifies the system message. If the application supports multiple modes or personas, selection is done via an allowlist of pre-approved system prompts (e.g., promptVariants[validatedKey]), not by including user input in the prompt text — fewer than 1 system prompt path should allow user-controlled modifications. Report: "X system prompt paths found, 0 accept user-controlled content."
Fail criteria: User-provided text is used to modify or append to the system message. A request parameter allows selection of an arbitrary system prompt string. An admin feature accepts free-text system prompt input without sanitization.
Skip (N/A) when: No AI provider integration detected.
Do NOT pass when: User preferences or settings are injected into the system prompt without sanitization, even if framed as "personalization."
Detail on fail: "POST /api/chat accepts a systemPromptOverride field in the request body and includes it in the system message" or "User's 'AI personality' preference is concatenated directly into the system prompt"

Remediation: Allowing users to modify the system prompt is equivalent to giving them shell access to your AI behavior. Use allowlists:

const PROMPT_VARIANTS = {
  'assistant': ASSISTANT_SYSTEM_PROMPT,
  'analyst': ANALYST_SYSTEM_PROMPT,
} as const

type PromptKey = keyof typeof PROMPT_VARIANTS

function getSystemPrompt(key: unknown): string {
  const validKey = key as PromptKey
  return PROMPT_VARIANTS[validKey] ?? PROMPT_VARIANTS['assistant']
}

Never concatenate user-provided strings into system prompts under any circumstances.

External references

cwe · CWE-1427 — Improper Neutralization of Input Used in AI/ML Prompt Injection
owasp-llm:2025 · LLM01 — Prompt Injection
mitre-atlas:v4 · AML.T0054 — LLM Jailbreak
nist-ai-rmf:1.0 · GOVERN 1.1 — Policies, processes, procedures, and practices across organization to manage AI risks

Taxons

inference-contract injection-and-input-trust

History

2026-04-18·v1.0.0·Initial import from ai-prompt-injection·automated