All 20 checks with why-it-matters prose, severity, and cross-references to related audits.
Sending user data to a third-party AI provider without consent violates GDPR Article 6 (lawful basis for processing) and Article 7 (conditions for consent). Under CCPA §1798.100, consumers have the right to know how their data is used. When an AI API call fires unconditionally on every user message, you are processing personal data without a documented lawful basis — exposing the business to regulatory enforcement, fines up to 4% of global annual revenue under GDPR, and user trust collapse if discovered. OWASP A01 (Broken Access Control) classification applies when the missing gate allows any session to trigger AI processing regardless of consent state.
Why this severity: Critical because absent server-side consent gating means every user message is sent to a third-party AI provider with no legal basis, directly violating GDPR Art. 6 and exposing the operator to maximum-tier regulatory penalties.
ai-data-privacy.data-collection-consent.user-consent-before-ai-processingSee full patternGDPR Articles 13 and 14 require data controllers to inform users about third-party processors and the purposes for which their data is shared — at the time of collection. CCPA §1798.100 similarly mandates disclosure of data sharing practices. A privacy policy that makes no mention of AI providers, large language models, or the categories of data sent to them is a material omission that creates regulatory exposure and undermines user trust. When a breach or complaint triggers regulatory review, an undisclosed sub-processor is one of the clearest indicators of non-compliance.
Why this severity: High because a privacy policy that omits AI data processing is a documented regulatory violation under GDPR Art. 13/14 and CCPA §1798.100, creating direct liability without requiring any breach to occur.
ai-data-privacy.data-collection-consent.privacy-policy-ai-disclosureSee full patternGDPR Article 7 requires that consent be as specific as the processing activity — a blanket "I agree to AI" toggle does not meet the granularity standard when your product uses AI for distinct purposes (writing assistance, history saving, analytics, model training opt-in). CCPA §1798.120 gives consumers the right to opt out of specific categories of data sale or sharing. A single on/off switch forces users into an all-or-nothing choice, reducing trust and limiting your ability to demonstrate specific lawful bases for each processing activity.
Why this severity: Medium because a monolithic AI toggle limits compliance defensibility under GDPR Art. 7 and erodes user trust, but does not by itself enable unauthorized data access.
ai-data-privacy.data-collection-consent.granular-ai-opt-inSee full patternThe EU AI Act requires that users be notified when they are interacting with an AI system in certain contexts, and the principle is extending across jurisdictions. When AI-generated responses are visually indistinguishable from human-authored content, users cannot make informed decisions about how much to trust or act on the output. OWASP LLM09 (Misinformation) classifies unmarked AI output as a reliability risk — users who mistake AI responses for authoritative human answers are more likely to act on hallucinated information. Disclosure is the minimum accountability signal an AI-powered product can provide.
Why this severity: Low because the missing indicator is a transparency deficiency rather than a direct attack vector, but it enables harm through uninformed over-reliance on AI-generated content.
ai-data-privacy.data-collection-consent.ai-processing-indicatorSee full patternLanguage models hallucinate — they generate plausible-sounding but factually incorrect output with no inherent signal to the user that something is wrong. NIST AI RMF MAP 5.1 explicitly calls out the risk of users treating AI output as authoritative. OWASP LLM09 categorizes this as a misinformation risk. Without a visible accuracy disclaimer, users who act on incorrect AI-generated advice — whether medical, legal, financial, or procedural — have no prior warning that verification was expected. The developer carries moral and increasingly legal exposure when that warning is absent.
Why this severity: Info because the absence of a disclaimer is a transparency gap rather than a technical vulnerability, but it increases user harm from hallucinated outputs and reduces the developer's liability defense.
ai-data-privacy.data-collection-consent.ai-accuracy-disclaimerSee full patternGDPR Article 25 mandates data protection by design — minimizing the personal data transmitted to processors. OWASP A02 (Cryptographic Failures) and CWE-359 (Exposure of Private Information) both apply when PII travels unredacted to a third party. Users routinely paste email addresses, phone numbers, credit card numbers, and SSNs into chat interfaces without realizing the consequences. Without a server-side screening pass, every one of those values is sent verbatim to the AI provider, where it persists in their logs and training pipelines according to their retention policies — not yours. The 2023 Samsung incident, where employees pasted internal code into ChatGPT, is the canonical example of this failure mode.
Why this severity: Critical because unredacted PII sent to a third-party AI provider constitutes unauthorized data disclosure under GDPR Art. 25, CWE-359, and OWASP A02, with no technical control stopping the exfiltration once the API call fires.
ai-data-privacy.pii-protection.pii-redacted-before-apiSee full patternGDPR Article 5(1)(f) requires that personal data be processed with appropriate security, including protection against unauthorized access — which includes log aggregation services. CWE-532 (Insertion of Sensitive Information into Log File) and OWASP A09 (Security Logging and Monitoring Failures) both apply when prompt content ends up in application logs. Application logs typically flow to third-party aggregators like Datadog, Logtail, or Better Stack, each with their own access controls, retention windows, and breach surfaces. When a developer logs `console.log({ messages, model })` during a debugging session and forgets to remove it, every user prompt written since is now stored in a system with weaker controls than the primary database.
Why this severity: High because prompt content in persistent logs constitutes an unauthorized third-party disclosure under GDPR Art. 5(1)(f) and CWE-532, multiplying the number of parties that hold user data without consent.
ai-data-privacy.pii-protection.no-pii-in-logsSee full patternGDPR Article 5(1)(c) codifies the data minimization principle: personal data must be adequate, relevant, and limited to what is necessary for the processing purpose. CCPA §1798.100 establishes parallel minimization expectations. When a developer serializes a full user object into an AI prompt — including email, hashed password, billing address, internal IDs, and account metadata — fields that have no bearing on the AI's task travel to the provider. The AI does not need the user's email to write a support reply. It does not need their `created_at` timestamp to summarize a document. Every unnecessary field is data shared without purpose, creating regulatory exposure and expanding the blast radius of any future provider breach.
Why this severity: Medium because full-object serialization into AI prompts sends unnecessary personal data to a third-party processor in violation of GDPR Art. 5(1)(c), but the harm requires a downstream event (breach, subpoena) to materialize.
ai-data-privacy.pii-protection.data-minimization-in-promptsSee full patternOWASP LLM01 (Prompt Injection) attackers often use injected instructions to extract or reproduce PII that has been provided in the conversation context. Adding behavioral guardrails to the system prompt — "do not repeat personal information the user shares" — reduces the surface area for accidental PII regurgitation. NIST AI RMF MANAGE 2.2 calls for implementing controls to minimize AI-generated harm. While system prompt instructions are not a technical security control and can be overridden by adversarial prompts, they meaningfully reduce accidental PII generation from normal interactions and demonstrate a documented intent to protect user data.
Why this severity: Low because system prompt instructions are a behavioral rather than technical control, but their absence leaves the AI with no documented guidance on PII handling — increasing accidental disclosure risk in normal usage.
ai-data-privacy.pii-protection.system-prompt-pii-safeguardSee full patternGDPR Article 5(1)(c) data minimization applies to what users are encouraged to submit, not just what the application collects programmatically. Developer tools, customer support AI, and general-purpose chat interfaces are contexts where users routinely paste credentials, API keys, database connection strings, and personal account details — often because they are troubleshooting and it is the fastest way to show the problem. Without a visible advisory, users have no prompt to reconsider. A single line of placeholder text or helper text shifts user behavior at essentially zero implementation cost.
Why this severity: Info because no technical control is missing, but the absent advisory measurably increases the frequency of users inadvertently submitting sensitive data to the AI pipeline.
ai-data-privacy.pii-protection.sensitive-input-warningSee full patternCWE-359 (Exposure of Private Information) and NIST SP 800-53 Rev. 5 SI-12 (Information Management and Retention) both call for systematic controls on PII exposure. Handwritten regex patterns for PII detection have well-documented failure modes: phone number formats vary by country and era, email local parts accept characters most patterns miss, and edge cases in credit card BIN ranges cause false negatives. An established PII detection library or managed service provides format coverage that ad-hoc patterns cannot match. Even `validator` (npm, 10M weekly downloads) covers email, mobile phone across locales, and credit card validation with a single dependency.
Why this severity: Info because this check signals the quality and completeness of the PII filtering approach rather than the presence or absence of any filtering — a passing `pii-redacted-before-api` check may already provide adequate coverage.
ai-data-privacy.pii-protection.pii-detection-librarySee full patternGDPR Article 5(1)(e) requires that personal data not be kept longer than necessary for the processing purpose — the storage limitation principle. CCPA §1798.100 creates parallel expectations. AI conversation history is one of the richest stores of personal data an application can accumulate: it contains exactly what users said, in their own words, about their problems, intentions, and circumstances. Without an automated retention policy, this data accumulates indefinitely, increasing the blast radius of any future breach and the scope of any regulatory discovery request. ISO/IEC 27001:2022 A.8.10 (Information Deletion) codifies this as a formal control requirement.
Why this severity: High because indefinitely retained AI conversation data violates GDPR Art. 5(1)(e)'s storage limitation principle and ISO 27001 A.8.10, creating compounding regulatory and breach exposure with every day the data grows.
ai-data-privacy.data-retention-deletion.chat-history-retention-policySee full patternGDPR Article 17 (Right to Erasure) and CCPA §1798.105 (Right to Delete) give users the explicit right to request deletion of their personal data, including AI conversation history. When no self-service delete mechanism exists, every deletion request becomes a manual support ticket requiring engineering intervention — a workflow that does not scale and that many teams fail to process within the regulatory 30-day window. Worse, without a delete UI, most users do not know they can ask, meaning their data persists indefinitely by default rather than by informed choice.
Why this severity: Medium because the absence of a user-facing delete mechanism makes GDPR Art. 17 and CCPA §1798.105 compliance operationally impossible at scale and likely results in missed 30-day response deadlines.
ai-data-privacy.data-retention-deletion.user-delete-ai-historySee full patternGDPR Article 25 (Data Protection by Design) and ISO/IEC 27001:2022 A.8.10 both require that data lifecycle controls be built into the system architecture, not retrofitted. When AI conversation history is stored as a JSON blob column on the users table — `users.chat_history jsonb` — it becomes structurally impossible to apply independent retention policies, deletion on request, or access controls to conversation data without touching the user record. Every conversation row deletion becomes a user record mutation, and a data retention query must read the entire users table. Proper table separation is a prerequisite for every other data lifecycle control in this category.
Why this severity: Low because embedded-JSON storage is an architectural deficiency rather than an active vulnerability, but it structurally blocks all independent lifecycle controls for AI conversation data.
ai-data-privacy.data-retention-deletion.ai-logs-separationSee full patternGDPR Article 17 (Right to Erasure) and CCPA §1798.105 require that deletion of a user's account removes all associated personal data — not just the primary user record. CWE-459 (Incomplete Cleanup) covers exactly this pattern: a delete operation that removes the primary entity but leaves related records orphaned. AI conversation history typically contains some of the most sensitive personal data in the system — verbatim user statements about their problems, circumstances, and intentions. Orphaned conversation records after account deletion mean the user's data persists in the database indefinitely under a now-deleted user ID, violating the erasure right and potentially creating ghost data that can be queried without ownership context.
Why this severity: Info because cascade delete configuration is a correctness issue rather than an exploitable vulnerability, but orphaned AI data after account deletion constitutes a documented GDPR Art. 17 compliance gap.
ai-data-privacy.data-retention-deletion.gdpr-erasure-ai-dataSee full patternNIST AI RMF GOVERN 1.7 requires organizations to maintain documentation of AI system decisions. ISO/IEC 27001:2022 A.8.15 (Logging) and NIST SP 800-53 Rev. 5 AU-3 require sufficient log detail to reconstruct events. When AI responses are stored without model version metadata, it becomes impossible to explain to a user why they received a particular response, identify which responses might be affected by a model deprecation or safety issue, or meet audit requests about AI decision provenance. Storing `model: 'gpt-4o'` alongside each response is a three-field schema change with significant operational value.
Why this severity: Info because missing audit trail metadata does not enable data exposure but makes AI governance, debugging, and compliance attestation structurally impossible without it.
ai-data-privacy.data-retention-deletion.ai-decision-audit-trailSee full patternGDPR Article 32 requires that personal data be transmitted with appropriate technical security measures — which universally means TLS encryption. CWE-319 (Cleartext Transmission of Sensitive Information) and OWASP A02 (Cryptographic Failures) both apply when AI prompt data travels over unencrypted HTTP. AI prompts contain user input in its rawest form: questions, documents, instructions, and frequently PII the user embedded without realizing it. Any network observer on the path between your server and the AI provider endpoint — a misconfigured proxy, a shared hosting environment, a compromised network hop — can read every prompt in plaintext. NIST SP 800-53 Rev. 5 SC-8 mandates transmission confidentiality as a baseline control.
Why this severity: Critical because HTTP-transmitted AI prompts expose user content to any network observer between server and provider, constituting cleartext PII transmission in violation of GDPR Art. 32, CWE-319, and OWASP A02.
ai-data-privacy.third-party-ai-provider.api-transport-encryptedSee full patternGDPR Article 28 requires that sub-processors (including error monitoring services) process data only as instructed by the controller and under a Data Processing Agreement. CWE-532 and OWASP A09 both cover sensitive data ending up in log outputs. Error monitoring services like Sentry, LogRocket, and Datadog are third parties with their own retention policies, access controls, and breach surfaces — often less stringent than your primary database. When a developer attaches the full AI messages array to a Sentry error for debugging convenience, they have just disclosed every user message in that request to a third party that was not disclosed in the privacy policy for that purpose.
Why this severity: High because attaching prompt content to error reports constitutes an undisclosed disclosure to an additional third-party sub-processor in violation of GDPR Art. 28, compounding the data exposure beyond the intended AI provider.
ai-data-privacy.third-party-ai-provider.no-user-data-error-reportingSee full patternGDPR Article 28 requires that when a processor (AI provider) uses sub-processors or uses data for additional purposes (like model training), the controller (you) must be aware of and have agreed to those terms. CCPA §1798.100 and NIST AI RMF GOVERN 6.1 both require transparency about how user data is used by third parties. By default, some consumer-tier AI API plans allow the provider to use API data for model improvement. For B2B products or any application handling sensitive user data, allowing user conversations to feed into a third party's training corpus without user disclosure is a material compliance gap.
Why this severity: Medium because undocumented training data usage by an AI provider constitutes an undisclosed secondary purpose under GDPR Art. 28 and CCPA §1798.100, creating regulatory exposure without a technical breach.
ai-data-privacy.third-party-ai-provider.disable-training-usageSee full patternCWE-532 (Insertion of Sensitive Information into Log File), OWASP A09 (Security Logging and Monitoring Failures), and GDPR Article 5(1)(f) (integrity and confidentiality) all apply when wildcard HTTP request body logging captures AI prompt content. HTTP interceptors configured to log all outgoing request bodies are a common debugging pattern — axios interceptors, custom fetch wrappers, and Node.js `http` module overrides — that developers add during development and forget to remove or scope. In an AI application, these interceptors will capture every prompt sent to the AI provider and write it to whatever log sink is configured, regardless of sensitivity.
Why this severity: Low because wildcard request logging is typically a forgotten debugging artifact rather than an intentional disclosure, but it creates an ongoing passive exfiltration of all AI prompt content to the log sink.
ai-data-privacy.third-party-ai-provider.no-wildcard-loggingSee full patternRun this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.
Open AI Data Privacy Audit