AI training data opt-out is configured
Why it matters
GDPR Article 28 requires that when a processor (AI provider) uses sub-processors or uses data for additional purposes (like model training), the controller (you) must be aware of and have agreed to those terms. CCPA §1798.100 and NIST AI RMF GOVERN 6.1 both require transparency about how user data is used by third parties. By default, some consumer-tier AI API plans allow the provider to use API data for model improvement. For B2B products or any application handling sensitive user data, allowing user conversations to feed into a third party's training corpus without user disclosure is a material compliance gap.
Severity rationale
Medium because undocumented training data usage by an AI provider constitutes an undisclosed secondary purpose under GDPR Art. 28 and CCPA §1798.100, creating regulatory exposure without a technical breach.
Remediation
Verify your training opt-out status with your AI provider and document it in code. Steps vary by provider:
- OpenAI: API data is not used for training by default for API users. Confirm at platform.openai.com → Settings → Privacy. For contractual guarantees, use Azure OpenAI.
- Anthropic: API data not used for training by default. Verify at console.anthropic.com → Settings.
- Google: Use Vertex AI rather than AI Studio for enterprise data terms.
- Azure OpenAI: No training use by default under Microsoft's enterprise DPA.
Document the verification in your AI client configuration file:
// lib/ai/client.ts
// DATA POLICY: Using Azure OpenAI — data not used for model training
// per Microsoft Enterprise DPA. Verified: 2025-01 via Azure portal
// Data & Privacy settings. Re-verify on contract renewal.
const openai = new AzureOpenAI({ ... })
Set a calendar reminder to re-verify at contract renewal — provider terms change.
Detection
-
ID:
disable-training-usage -
Severity:
medium -
What to look for: Enumerate every relevant item. Look for signals that the project has opted out of training data usage. Indicators: use of Azure OpenAI endpoints (which have no-training-use guarantees by default), references to "Zero Data Retention" or "ZDR" in comments or configuration, OpenAI organization-level API usage where training opt-out is documented, or any API headers that signal training opt-out (provider-specific). Also check for comments like
// ZDR enabledor// Training opt-out via org settings. A secondary signal: the project uses an OpenAI API key under an organization that has opted out (not directly detectable in code, but comments referencing this are sufficient). -
Pass criteria: At least 1 of the following conditions is met. Evidence of training opt-out: Azure OpenAI endpoint in use, explicit comment referencing ZDR or organization-level training opt-out, or use of an enterprise API tier known to exclude training by default.
-
Fail criteria: Using a standard consumer-tier API (personal OpenAI key with no enterprise agreement, standard Anthropic API) with no evidence of training data opt-out configuration or organizational agreement.
-
Skip (N/A) when: The project uses only open-source self-hosted models (Ollama, vLLM, llama.cpp) where no third-party training can occur by definition.
-
Detail on fail:
"No evidence of training data opt-out found — standard API tier may use conversation data for model improvement by default" -
Remediation: By default, some AI providers may use API data for model improvement unless you opt out. This varies by provider and API tier.
Steps by provider:
- OpenAI: API usage is not used for training by default for API users, but verify your organization settings at platform.openai.com → Settings → Privacy. For additional guarantees, use Azure OpenAI Service.
- Anthropic: API data is not used for training by default. Verify at console.anthropic.com → Settings.
- Google AI: Gemini API usage through Google AI Studio may be reviewed by humans for safety. Use Vertex AI for Google Cloud's enterprise data processing terms.
- Azure OpenAI: No training use by default — covered by Microsoft's enterprise data processing agreement.
Add a comment in your AI configuration file confirming the opt-out status:
// lib/ai/client.ts // NOTE: Using Azure OpenAI — data not used for model training per our enterprise agreement. // Verified: 2024-01 via Azure portal Data & Privacy settings. const openai = new AzureOpenAI({ ... })
External references
- gdpr · Art. 28 — Controller-processor obligations — data processing agreement
- nist-ai-rmf:1.0 · GOVERN 6.1 — Policies governing third-party AI use including data-use restrictions
- ccpa · §1798.100 — Limitation on use of personal information by service providers
Taxons
History
- 2026-04-18·v1.0.0·Initial import from ai-data-privacy·automated