GDPR Article 28 requires that when a processor (AI provider) uses sub-processors or uses data for additional purposes (like model training), the controller (you) must be aware of and have agreed to those terms. CCPA §1798.100 and NIST AI RMF GOVERN 6.1 both require transparency about how user data is used by third parties. By default, some consumer-tier AI API plans allow the provider to use API data for model improvement. For B2B products or any application handling sensitive user data, allowing user conversations to feed into a third party's training corpus without user disclosure is a material compliance gap.
Medium because undocumented training data usage by an AI provider constitutes an undisclosed secondary purpose under GDPR Art. 28 and CCPA §1798.100, creating regulatory exposure without a technical breach.
Verify your training opt-out status with your AI provider and document it in code. Steps vary by provider:
Document the verification in your AI client configuration file:
// lib/ai/client.ts
// DATA POLICY: Using Azure OpenAI — data not used for model training
// per Microsoft Enterprise DPA. Verified: 2025-01 via Azure portal
// Data & Privacy settings. Re-verify on contract renewal.
const openai = new AzureOpenAI({ ... })
Set a calendar reminder to re-verify at contract renewal — provider terms change.
ID: ai-data-privacy.third-party-ai-provider.disable-training-usage
Severity: medium
What to look for: Enumerate every relevant item. Look for signals that the project has opted out of training data usage. Indicators: use of Azure OpenAI endpoints (which have no-training-use guarantees by default), references to "Zero Data Retention" or "ZDR" in comments or configuration, OpenAI organization-level API usage where training opt-out is documented, or any API headers that signal training opt-out (provider-specific). Also check for comments like // ZDR enabled or // Training opt-out via org settings. A secondary signal: the project uses an OpenAI API key under an organization that has opted out (not directly detectable in code, but comments referencing this are sufficient).
Pass criteria: At least 1 of the following conditions is met. Evidence of training opt-out: Azure OpenAI endpoint in use, explicit comment referencing ZDR or organization-level training opt-out, or use of an enterprise API tier known to exclude training by default.
Fail criteria: Using a standard consumer-tier API (personal OpenAI key with no enterprise agreement, standard Anthropic API) with no evidence of training data opt-out configuration or organizational agreement.
Skip (N/A) when: The project uses only open-source self-hosted models (Ollama, vLLM, llama.cpp) where no third-party training can occur by definition.
Detail on fail: "No evidence of training data opt-out found — standard API tier may use conversation data for model improvement by default"
Remediation: By default, some AI providers may use API data for model improvement unless you opt out. This varies by provider and API tier.
Steps by provider:
Add a comment in your AI configuration file confirming the opt-out status:
// lib/ai/client.ts
// NOTE: Using Azure OpenAI — data not used for model training per our enterprise agreement.
// Verified: 2024-01 via Azure portal Data & Privacy settings.
const openai = new AzureOpenAI({ ... })