Data minimization — only necessary context sent to AI

ab-000162 · ai-data-privacy.pii-protection.data-minimization-in-prompts

Severity: mediumactive

Why it matters

GDPR Article 5(1)(c) codifies the data minimization principle: personal data must be adequate, relevant, and limited to what is necessary for the processing purpose. CCPA §1798.100 establishes parallel minimization expectations. When a developer serializes a full user object into an AI prompt — including email, hashed password, billing address, internal IDs, and account metadata — fields that have no bearing on the AI's task travel to the provider. The AI does not need the user's email to write a support reply. It does not need their created_at timestamp to summarize a document. Every unnecessary field is data shared without purpose, creating regulatory exposure and expanding the blast radius of any future provider breach.

Severity rationale

Medium because full-object serialization into AI prompts sends unnecessary personal data to a third-party processor in violation of GDPR Art. 5(1)(c), but the harm requires a downstream event (breach, subpoena) to materialize.

Remediation

Build an explicit, minimal prompt context object rather than serializing the full user or database record. Document which fields each AI feature actually requires.

// Instead of:
const context = JSON.stringify(user) // Sends email, hash, billing, internal IDs

// Use:
const context = JSON.stringify({
  name: user.firstName,
  plan: user.subscriptionTier,
  // Only fields the AI prompt template actually references
})

Create a dedicated buildAiContext(user: User): AiUserContext function in lib/ai/context.ts that exports a typed, narrowed object. This makes it easy to audit what is being sent and prevents accidental field additions when the User model grows. Review the exported fields whenever new columns are added to the user record.

Detection

ID: data-minimization-in-prompts
Severity: medium
What to look for: Enumerate every relevant item. Examine how user context is injected into AI prompts. Look for patterns where full user objects are serialized into the prompt (e.g., JSON.stringify(user), `User info: ${JSON.stringify(userData)}`, spreading a full database row into prompt context). Also look for patterns that construct a minimal context object with only required fields.
Pass criteria: At least 1 of the following conditions is met. When user context is included in prompts, only the necessary fields are selected and injected — not the full user record. Code constructs an explicit subset object before stringifying it into the prompt.
Fail criteria: Full user objects, full database records, or large context objects are serialized directly into the AI prompt, potentially including fields like email, hashed_password, billing_address, created_at, or internal IDs that the AI does not need.
Skip (N/A) when: No structured user context is injected into prompts — prompts consist only of the user's direct message without additional context objects.
Detail on fail: "Full user/data object serialized into AI prompt in [file] — potentially sending fields like email, internal IDs, or other unnecessary data to the AI provider"
Remediation: The principle of data minimization means sending only what the AI needs to answer the question. A user's name and subscription plan is relevant context for a customer support AI. Their email address, password hash, and billing details are not.

Build an explicit prompt context object:
```
// Instead of:
const context = JSON.stringify(user) // Sends everything

// Use:
const context = JSON.stringify({
  name: user.firstName,
  plan: user.subscriptionTier,
  accountAge: user.createdAt ? 'established' : 'new'
  // Only fields the AI prompt actually uses
})
```
Document which fields each AI feature requires in a comment or schema. Review this list when adding new fields to user records.

For a broader review of what data your analytics pipeline captures, the Analytics & Tracking Audit covers data minimization in event tracking.

External references

gdpr · Art. 5(1)(c) — Data minimisation
ccpa · §1798.100 — Limitation on collection of personal information
nist-ai-rmf:1.0 · MAP 1.5 — Organisational context for AI system input data scope
cwe · CWE-359 — Exposure of Private Personal Information to an Unauthorized Actor

Taxons

privacy-consent inference-contract

History

2026-04-18·v1.0.0·Initial import from ai-data-privacy·automated

Why it matters

Remediation

Build an explicit, minimal prompt context object rather than serializing the full user or database record. Document which fields each AI feature actually requires.

// Instead of:
const context = JSON.stringify(user) // Sends email, hash, billing, internal IDs

// Use:
const context = JSON.stringify({
  name: user.firstName,
  plan: user.subscriptionTier,
  // Only fields the AI prompt template actually references
})

Detection

ID: data-minimization-in-prompts
Severity: medium
What to look for: Enumerate every relevant item. Examine how user context is injected into AI prompts. Look for patterns where full user objects are serialized into the prompt (e.g., JSON.stringify(user), `User info: ${JSON.stringify(userData)}`, spreading a full database row into prompt context). Also look for patterns that construct a minimal context object with only required fields.
Pass criteria: At least 1 of the following conditions is met. When user context is included in prompts, only the necessary fields are selected and injected — not the full user record. Code constructs an explicit subset object before stringifying it into the prompt.
Fail criteria: Full user objects, full database records, or large context objects are serialized directly into the AI prompt, potentially including fields like email, hashed_password, billing_address, created_at, or internal IDs that the AI does not need.
Skip (N/A) when: No structured user context is injected into prompts — prompts consist only of the user's direct message without additional context objects.
Detail on fail: "Full user/data object serialized into AI prompt in [file] — potentially sending fields like email, internal IDs, or other unnecessary data to the AI provider"
Remediation: The principle of data minimization means sending only what the AI needs to answer the question. A user's name and subscription plan is relevant context for a customer support AI. Their email address, password hash, and billing details are not.

Build an explicit prompt context object:
```
// Instead of:
const context = JSON.stringify(user) // Sends everything

// Use:
const context = JSON.stringify({
  name: user.firstName,
  plan: user.subscriptionTier,
  accountAge: user.createdAt ? 'established' : 'new'
  // Only fields the AI prompt actually uses
})
```
Document which fields each AI feature requires in a comment or schema. Review this list when adding new fields to user records.

For a broader review of what data your analytics pipeline captures, the Analytics & Tracking Audit covers data minimization in event tracking.