AI-generated content is moderated — no deepfakes, disinformation, or unfiltered harmful output

ab-000452 · app-store-policy-compliance.content-restrictions.ai-content-moderated

Severity: mediumactive

Why it matters

Both Apple's Generative AI guidelines (2024) and Google Play's AI-generated content policy require that AI output be moderated before display, and that synthetic media of real people carry explicit consent and disclosure. OWASP LLM02 (Insecure Output Handling) and NIST AI RMF GOVERN-1.7 classify direct pass-through of LLM output to users as a governance failure. Practical consequences: a deepfake face-swap app without age verification will be rejected on first review; an AI chat that produces unmoderated harmful content will be removed after a user report regardless of the developer's intent.

Severity rationale

Medium because unmoderated AI output expands the app's attack surface to include any content an LLM can be induced to produce, while deepfake generation without consent mechanisms creates legal liability that persists after store removal.

Remediation

Add a content moderation call between AI output and display for all text-generation features:

// src/lib/ai.ts
async function generateAndScreen(prompt: string): Promise<string> {
  const response = await openai.chat.completions.create({ ... });
  const content = response.choices[0].message.content ?? '';

  const mod = await openai.moderations.create({ input: content });
  if (mod.results[0].flagged) {
    return "I can't help with that.";
  }
  return content;
}

Label all AI-generated content with a visible badge, especially in health, news, legal, or financial contexts. Lock the system prompt — never expose it to user configuration via a settings screen. For synthetic media features, add an explicit consent flow before enabling face-swap or voice cloning, and watermark all output.

Detection

ID: ai-content-moderated
Severity: medium
What to look for: Count all relevant instances and enumerate each. If AI integration is detected, examine the full pipeline from user input to displayed output. Look for: (1) Content moderation layer — Is there any filter, classifier, or moderation API call between AI output and user display? Look for calls to OpenAI's moderation endpoint (https://api.openai.com/v1/moderations), Google's SafeSearch API, AWS Rekognition, or a custom blocklist check. If none are present, the AI output is shown directly to users without filtering. (2) Deepfake / synthetic media generation — Does the app allow users to generate images, video, or audio using someone else's likeness? Look for face-swap APIs (DeepFaceLab, FaceSwap, roop), voice cloning (ElevenLabs, PlayHT, Coqui), or AI video generation that accepts facial photos as input. (3) Disinformation risk — Does the app generate news, health information, financial guidance, or legal advice via AI without a disclaimer? Search for AI-generated content displayed in contexts that imply authority (news feed, medical results screen, legal advice screen) without a "generated by AI" label. (4) Jailbreak surface — Is the AI system prompt user-configurable? Can users override the system prompt via a settings screen? Can they inject instructions via crafted input that the app passes directly to the AI? Apple's Generative AI guidelines (added 2024) and Google Play's AI-generated content policy require moderation. (5) Age-gating — If the AI can generate adult content (even when prompted), is there an age verification gate? NSFW image generation without age verification will cause rejection on both platforms.
Pass criteria: AI output is either passed through a content moderation layer before display, or the AI is constrained to a narrow domain where harmful output is not plausible (e. At least 1 implementation must be verified.g., an AI that only generates recipe suggestions). Deepfake or synthetic likeness generation is either absent or age-gated and clearly labeled as AI-generated. AI-generated content is labeled as such when displayed in authoritative contexts.
Fail criteria: AI output is displayed directly to users with no moderation layer; deepfake face-swap or voice cloning enabled without age verification and consent mechanisms; AI content in medical/legal/financial/news contexts has no "AI-generated" disclaimer; system prompt is fully user-configurable with no constraints.
Skip (N/A) when: No AI or LLM integration detected in package.json or source files (no AI provider SDKs, no calls to AI APIs).
Detail on fail: "AI chat responses in src/screens/ChatScreen.tsx are passed directly to display with no content moderation layer or safety filtering" or "App allows face-swap image generation via Replicate API with no age verification and no 'AI-generated' label on output"
Remediation: Both stores have added explicit AI content policies and are actively enforcing them.
1. Add OpenAI moderation API calls before displaying any AI-generated text: POST https://api.openai.com/v1/moderations — flag and block responses with flagged: true
2. Label all AI-generated content with a visible "Generated by AI" indicator, especially in health, news, legal, or financial contexts
3. Lock down the system prompt — never expose it to user configuration, and use prompt injection defenses (input sanitization, instruction hierarchy)
4. If your app generates synthetic media of real people, add explicit consent flows and ensure generated content is watermarked or labeled
Review the configuration in src/ or app/ directory for implementation patterns.
Cross-reference: For related patterns and deeper analysis, see the corresponding checks in other AuditBuffet audits covering this domain.

External references

owasp-llm:2025 · LLM02 — Insecure Output Handling
owasp-llm:2025 · LLM01 — Prompt Injection
nist-ai-rmf:1.0 · GOVERN-1.7 — Processes for AI risk management including harmful output
external · apple-generative-ai-guidelines — App Store Review Guidelines — Generative AI
external · google-play-ai-content — Google Play AI-Generated Content Policy

Taxons

regulatory-conformance inference-contract

History

2026-04-18·v1.0.0·Initial import from app-store-policy-compliance·automated

Why it matters

Remediation

Add a content moderation call between AI output and display for all text-generation features:

// src/lib/ai.ts
async function generateAndScreen(prompt: string): Promise<string> {
  const response = await openai.chat.completions.create({ ... });
  const content = response.choices[0].message.content ?? '';

  const mod = await openai.moderations.create({ input: content });
  if (mod.results[0].flagged) {
    return "I can't help with that.";
  }
  return content;
}

Detection

ID: ai-content-moderated
Severity: medium
What to look for: Count all relevant instances and enumerate each. If AI integration is detected, examine the full pipeline from user input to displayed output. Look for: (1) Content moderation layer — Is there any filter, classifier, or moderation API call between AI output and user display? Look for calls to OpenAI's moderation endpoint (https://api.openai.com/v1/moderations), Google's SafeSearch API, AWS Rekognition, or a custom blocklist check. If none are present, the AI output is shown directly to users without filtering. (2) Deepfake / synthetic media generation — Does the app allow users to generate images, video, or audio using someone else's likeness? Look for face-swap APIs (DeepFaceLab, FaceSwap, roop), voice cloning (ElevenLabs, PlayHT, Coqui), or AI video generation that accepts facial photos as input. (3) Disinformation risk — Does the app generate news, health information, financial guidance, or legal advice via AI without a disclaimer? Search for AI-generated content displayed in contexts that imply authority (news feed, medical results screen, legal advice screen) without a "generated by AI" label. (4) Jailbreak surface — Is the AI system prompt user-configurable? Can users override the system prompt via a settings screen? Can they inject instructions via crafted input that the app passes directly to the AI? Apple's Generative AI guidelines (added 2024) and Google Play's AI-generated content policy require moderation. (5) Age-gating — If the AI can generate adult content (even when prompted), is there an age verification gate? NSFW image generation without age verification will cause rejection on both platforms.
Pass criteria: AI output is either passed through a content moderation layer before display, or the AI is constrained to a narrow domain where harmful output is not plausible (e. At least 1 implementation must be verified.g., an AI that only generates recipe suggestions). Deepfake or synthetic likeness generation is either absent or age-gated and clearly labeled as AI-generated. AI-generated content is labeled as such when displayed in authoritative contexts.
Fail criteria: AI output is displayed directly to users with no moderation layer; deepfake face-swap or voice cloning enabled without age verification and consent mechanisms; AI content in medical/legal/financial/news contexts has no "AI-generated" disclaimer; system prompt is fully user-configurable with no constraints.
Skip (N/A) when: No AI or LLM integration detected in package.json or source files (no AI provider SDKs, no calls to AI APIs).
Detail on fail: "AI chat responses in src/screens/ChatScreen.tsx are passed directly to display with no content moderation layer or safety filtering" or "App allows face-swap image generation via Replicate API with no age verification and no 'AI-generated' label on output"
Remediation: Both stores have added explicit AI content policies and are actively enforcing them.
1. Add OpenAI moderation API calls before displaying any AI-generated text: POST https://api.openai.com/v1/moderations — flag and block responses with flagged: true
2. Label all AI-generated content with a visible "Generated by AI" indicator, especially in health, news, legal, or financial contexts
3. Lock down the system prompt — never expose it to user configuration, and use prompt injection defenses (input sanitization, instruction hierarchy)
4. If your app generates synthetic media of real people, add explicit consent flows and ensure generated content is watermarked or labeled
Review the configuration in src/ or app/ directory for implementation patterns.
Cross-reference: For related patterns and deeper analysis, see the corresponding checks in other AuditBuffet audits covering this domain.

External references

owasp-llm:2025 · LLM02 — Insecure Output Handling

owasp-llm:2025 · LLM01 — Prompt Injection

nist-ai-rmf:1.0 · GOVERN-1.7 — Processes for AI risk management including harmful output

external · apple-generative-ai-guidelines — App Store Review Guidelines — Generative AI

external · google-play-ai-content — Google Play AI-Generated Content Policy