PII is redacted or screened before AI API calls

ab-000160 · ai-data-privacy.pii-protection.pii-redacted-before-api

Severity: criticalactive

Why it matters

GDPR Article 25 mandates data protection by design — minimizing the personal data transmitted to processors. OWASP A02 (Cryptographic Failures) and CWE-359 (Exposure of Private Information) both apply when PII travels unredacted to a third party. Users routinely paste email addresses, phone numbers, credit card numbers, and SSNs into chat interfaces without realizing the consequences. Without a server-side screening pass, every one of those values is sent verbatim to the AI provider, where it persists in their logs and training pipelines according to their retention policies — not yours. The 2023 Samsung incident, where employees pasted internal code into ChatGPT, is the canonical example of this failure mode.

Severity rationale

Critical because unredacted PII sent to a third-party AI provider constitutes unauthorized data disclosure under GDPR Art. 25, CWE-359, and OWASP A02, with no technical control stopping the exfiltration once the API call fires.

Remediation

Implement a server-side redaction pass in lib/ai/pii-guard.ts and apply it before every AI API call. Server-side is mandatory — client-side filtering is bypassable.

// lib/ai/pii-guard.ts
const EMAIL_RE = /\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b/g
const PHONE_RE = /(\+?1[\s.-]?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}/g
const CC_RE = /\b\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}\b/g

export function redactPii(input: string): string {
  return input
    .replace(EMAIL_RE, '[EMAIL]')
    .replace(PHONE_RE, '[PHONE]')
    .replace(CC_RE, '[CARD]')
}

Apply before every AI invocation:

const safePrompt = redactPii(userInput)
const response = await openai.chat.completions.create({
  messages: [{ role: 'user', content: safePrompt }]
})

For higher-assurance use cases, AWS Comprehend DetectPiiEntities covers 100+ PII types. Verify with a unit test: pass an email address string and assert the AI mock receives [EMAIL].

Detection

ID: pii-redacted-before-api
Severity: critical
What to look for: Enumerate every relevant item. Trace the code path from user input to the AI provider API call. Look for any transformation, filtering, or sanitization step between receiving the user's message and passing it to the AI SDK function. Signals: a function named redact, sanitize, cleanPii, scrubPii, or similar called before the AI invocation; regex patterns that replace email addresses, phone numbers, SSNs, or credit card numbers; use of PII detection libraries (presidio, @aws-sdk/client-comprehend, Google DLP client) in the AI pipeline; or a middleware layer that processes messages before they reach the AI call.
Pass criteria: At least 1 of the following conditions is met. At least one PII screening or redaction step exists in the code path between user input and the AI API call. The step is in server-side code (not only client-side validation).
Fail criteria: User input is passed directly to the AI SDK (openai.chat.completions.create, anthropic.messages.create, generateText, streamText, etc.) without any intermediate filtering or transformation. The raw user string goes straight to the provider.
Do NOT pass when: The item exists only as a placeholder, stub, or TODO comment — partial implementation does not count as passing.
Skip (N/A) when: The application's explicit purpose requires sending PII to the AI (e.g., an AI-powered resume parser, a medical records summarizer) AND a Data Processing Agreement with the provider explicitly covers this use case. The skip signal must be both a clear intentional PII-sending pattern AND a DPA reference in comments or configuration.
Detail on fail: "User input passed directly to AI provider in [file] without PII screening — raw message content reaches the third-party API unchanged"
Remediation: Even when users don't intend to share PII, they often paste email addresses, phone numbers, or other identifiers into chat interfaces. A basic screening pass reduces the risk of inadvertently sending this data to the AI provider.

Implement a server-side redaction utility:
```
// lib/ai/pii-guard.ts
const EMAIL_RE = /\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b/g
const PHONE_RE = /(\+?1[\s.-]?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}/g
const CC_RE = /\b\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}\b/g

export function redactPii(input: string): string {
  return input
    .replace(EMAIL_RE, '[EMAIL]')
    .replace(PHONE_RE, '[PHONE]')
    .replace(CC_RE, '[CARD]')
}
```
Then apply it before every AI call:
```
const safePrompt = redactPii(userInput)
const response = await openai.chat.completions.create({
  messages: [{ role: 'user', content: safePrompt }]
})
```
For higher-assurance use cases, consider AWS Comprehend PII detection or Google Cloud DLP. To verify: write a test that passes an email address and asserts the AI mock receives [EMAIL] instead.

For a deeper analysis of injection attack vectors in AI pipelines, the AI Prompt Injection Audit covers PII leakage through prompt manipulation in detail.

External references

gdpr · Art. 25 — Data protection by design and by default
owasp:2021 · A02 — Cryptographic Failures / sensitive data exposure
cwe · CWE-359 — Exposure of Private Personal Information to an Unauthorized Actor
nist:rev5 · SI-12 — Information management and retention
owasp-llm:2025 · LLM06 — Excessive Agency — sensitive data leaving trust boundary

Taxons

privacy-consent inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-data-privacy·automated

Why it matters

Remediation

Implement a server-side redaction pass in lib/ai/pii-guard.ts and apply it before every AI API call. Server-side is mandatory — client-side filtering is bypassable.

// lib/ai/pii-guard.ts
const EMAIL_RE = /\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b/g
const PHONE_RE = /(\+?1[\s.-]?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}/g
const CC_RE = /\b\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}\b/g

export function redactPii(input: string): string {
  return input
    .replace(EMAIL_RE, '[EMAIL]')
    .replace(PHONE_RE, '[PHONE]')
    .replace(CC_RE, '[CARD]')
}

Apply before every AI invocation:

const safePrompt = redactPii(userInput)
const response = await openai.chat.completions.create({
  messages: [{ role: 'user', content: safePrompt }]
})

For higher-assurance use cases, AWS Comprehend DetectPiiEntities covers 100+ PII types. Verify with a unit test: pass an email address string and assert the AI mock receives [EMAIL].

Detection

ID: pii-redacted-before-api
Severity: critical
What to look for: Enumerate every relevant item. Trace the code path from user input to the AI provider API call. Look for any transformation, filtering, or sanitization step between receiving the user's message and passing it to the AI SDK function. Signals: a function named redact, sanitize, cleanPii, scrubPii, or similar called before the AI invocation; regex patterns that replace email addresses, phone numbers, SSNs, or credit card numbers; use of PII detection libraries (presidio, @aws-sdk/client-comprehend, Google DLP client) in the AI pipeline; or a middleware layer that processes messages before they reach the AI call.
Pass criteria: At least 1 of the following conditions is met. At least one PII screening or redaction step exists in the code path between user input and the AI API call. The step is in server-side code (not only client-side validation).
Fail criteria: User input is passed directly to the AI SDK (openai.chat.completions.create, anthropic.messages.create, generateText, streamText, etc.) without any intermediate filtering or transformation. The raw user string goes straight to the provider.
Do NOT pass when: The item exists only as a placeholder, stub, or TODO comment — partial implementation does not count as passing.
Skip (N/A) when: The application's explicit purpose requires sending PII to the AI (e.g., an AI-powered resume parser, a medical records summarizer) AND a Data Processing Agreement with the provider explicitly covers this use case. The skip signal must be both a clear intentional PII-sending pattern AND a DPA reference in comments or configuration.
Detail on fail: "User input passed directly to AI provider in [file] without PII screening — raw message content reaches the third-party API unchanged"
Remediation: Even when users don't intend to share PII, they often paste email addresses, phone numbers, or other identifiers into chat interfaces. A basic screening pass reduces the risk of inadvertently sending this data to the AI provider.

Implement a server-side redaction utility:
```
// lib/ai/pii-guard.ts
const EMAIL_RE = /\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b/g
const PHONE_RE = /(\+?1[\s.-]?)?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}/g
const CC_RE = /\b\d{4}[\s\-]?\d{4}[\s\-]?\d{4}[\s\-]?\d{4}\b/g

export function redactPii(input: string): string {
  return input
    .replace(EMAIL_RE, '[EMAIL]')
    .replace(PHONE_RE, '[PHONE]')
    .replace(CC_RE, '[CARD]')
}
```
Then apply it before every AI call:
```
const safePrompt = redactPii(userInput)
const response = await openai.chat.completions.create({
  messages: [{ role: 'user', content: safePrompt }]
})
```
For higher-assurance use cases, consider AWS Comprehend PII detection or Google Cloud DLP. To verify: write a test that passes an email address and asserts the AI mock receives [EMAIL] instead.

For a deeper analysis of injection attack vectors in AI pipelines, the AI Prompt Injection Audit covers PII leakage through prompt manipulation in detail.

External references

gdpr · Art. 25 — Data protection by design and by default

owasp:2021 · A02 — Cryptographic Failures / sensitive data exposure

cwe · CWE-359 — Exposure of Private Personal Information to an Unauthorized Actor

nist:rev5 · SI-12 — Information management and retention

owasp-llm:2025 · LLM06 — Excessive Agency — sensitive data leaving trust boundary