v1.1.0Pro21 checks

Every Content Moderation & Safety Audit check

All 21 checks with why-it-matters prose, severity, and cross-references to related audits.

4 critical6 high6 medium5 low

Content Filtering & Detection

6 checks

Profanity and abuse keywords are filtered

low

Unfiltered user-submitted content creates a hostile environment that drives away legitimate users and exposes the platform to liability under CWE-20 (Improper Input Validation). Without server-side profanity and abuse keyword filtering, a single bad actor can flood posts, comments, and user bios with slurs or harassment before any human moderator intervenes. Client-only filtering is bypassed trivially via direct API calls, making it security theater. Platforms without this control also struggle to satisfy app store moderation requirements and advertiser brand-safety standards.

Why this severity: Low because abusive language harms community health and user retention but does not directly expose credentials, enable account takeover, or leak sensitive data.

community-moderation-safety.content-filtering.profanity-filteringSee full pattern

Spam patterns and duplicate content are detected

low

Without spam detection, a single automated account can flood a community feed with duplicate or rapid-fire posts, drowning legitimate content and degrading the user experience to the point where real users leave. This is a CWE-770 (Allocation of Resources Without Limits) failure: uncapped post creation burns database storage and CDN bandwidth at attacker-controlled rates. Beyond quality, spam floods can mask coordinated manipulation campaigns — link farms, referral abuse, or coordinated inauthentic behavior — that are invisible without rate tracking.

Why this severity: Low because spam degrades platform quality and inflates infrastructure costs but does not expose user data or enable direct account compromise.

community-moderation-safety.content-filtering.spam-detectionSee full pattern

Automated content filtering uses safe defaults

medium

Moderation APIs configured with permissive thresholds (toxicity > 0.9 to flag) will miss the majority of harmful content — most real-world toxic posts score between 0.5 and 0.8. A threshold that's too high means your automated filter is effectively disabled, forcing human moderators to catch everything the system lets through. Under CWE-20, this is an input validation failure: the system nominally validates content but its configuration makes the validation meaningless. Platforms that over-rely on high-confidence thresholds face the same regulatory risk as platforms with no filtering at all.

Why this severity: Medium because misconfigured thresholds systematically under-filter harmful content, compounding moderation workload and exposing users to abuse the system was supposed to catch.

community-moderation-safety.content-filtering.safe-defaultsSee full pattern

HTML and script injection are prevented

critical

Stored XSS (OWASP A03:2021, CWE-79) in user-generated content is one of the most severe vulnerabilities a community platform can have. When unsanitized HTML or script tags are persisted to the database and rendered in other users' browsers, an attacker can steal session cookies, exfiltrate auth tokens, hijack accounts, or redirect victims to phishing pages — all without any interaction beyond viewing a post or comment. Every user who views the malicious content becomes a victim. Client-side sanitization alone is bypassed trivially via direct API calls; server-side sanitization before storage is the only reliable control.

Why this severity: Critical because stored XSS lets an attacker execute arbitrary JavaScript in every viewer's browser, enabling session hijacking and account takeover at scale.

community-moderation-safety.content-filtering.xss-preventionSee full pattern

Automated toxicity scoring runs on user-submitted text

high

Keyword-based profanity filters catch known bad words but miss context-dependent toxicity: veiled threats, dog-whistles, coded harassment, and insults that don't trigger keyword lists. Automated toxicity scoring via ML models (Perspective API, OpenAI Moderation, AWS Comprehend) catches the patterns that keyword lists miss. Without it, communities routinely develop persistent harassment cultures because bad actors quickly learn which phrases bypass keyword filters. This is CWE-20 input validation applied at the semantic layer. Platforms with substantial UGC and no toxicity scoring have higher reported abuse rates and higher moderator burnout.

Why this severity: High because without ML toxicity scoring, keyword-bypassing harassment goes undetected, degrading community safety and driving out users who experience sustained abuse.

community-moderation-safety.content-filtering.toxicity-scoringSee full pattern

Images and media are scanned for policy violations

low

Platforms that accept image uploads without scanning expose their users to CSAM, non-consensual intimate imagery, graphic violence, and extremist content — all categories with serious legal consequences under CWE-434 (Unrestricted Upload of File with Dangerous Type) and OWASP A04:2021. Beyond legal risk, explicit imagery appearing in community feeds causes immediate user harm and platform reputation damage that is difficult to recover from. Manual review alone cannot scale: a single bad actor can upload hundreds of images in minutes, overwhelming any human queue.

Why this severity: Low because image scanning adds meaningful safety coverage but platforms without it may still have manual review processes that partially compensate, and the attack requires explicit attacker intent.

community-moderation-safety.content-filtering.image-scanningSee full pattern

Report & Enforcement Systems

5 checks

Users can report content and abuse

critical

Without a user-facing report mechanism, victims of harassment, hate speech, and spam have no in-platform recourse. They either suffer the abuse in silence or abandon the platform. Under the EU Digital Services Act (DSA Art. 17), any platform hosting user content must provide accessible flagging mechanisms — failure is a compliance violation carrying substantial fines. Beyond legal exposure, communities without reporting mechanisms develop entrenched bad-actor cultures because there is no friction on harassment. Platforms with DSA compliance obligations that skip this control face enforcement actions, not just poor community health.

Why this severity: Critical because the absence of a report mechanism is both a DSA compliance failure and a fundamental community safety gap — victims have no in-platform recourse against abuse.

community-moderation-safety.report-enforcement.user-reportingSee full pattern

Moderation queue and review interface exists

critical

Collecting reports without a tool to review them is operationally equivalent to collecting no reports at all. Under DSA Art. 22, platforms must ensure reported content is reviewed in a timely manner — storing reports in a database that only a developer can query via SQL satisfies neither the letter nor the spirit of that requirement. Without a moderation interface, backlogs grow unbounded, harmful content stays visible indefinitely, and moderators cannot triage by severity or content type. This is the enforcement gap between detection and action.

Why this severity: Critical because without a reviewable queue, collected reports are inert — no moderator can act on them, so the reporting system provides no actual safety improvement.

community-moderation-safety.report-enforcement.moderation-queueSee full pattern

Content can be deleted or hidden by moderators

critical

Storing a deletion flag that is never checked in the read path is a false sense of control — under DSA Art. 16 and CWE-284, content removal must actually prevent the content from being visible to users. Platforms where `deleted=true` content still appears in public feeds, search results, or API responses give bad actors exactly what they want: their content stays up regardless of moderation actions. Soft-delete patterns that skip the WHERE clause in even one public query defeat the entire enforcement model.

Why this severity: Critical because ineffective content removal means moderation actions have no real effect — abusive, illegal, or harmful content remains publicly visible after moderators act.

community-moderation-safety.report-enforcement.content-removalSee full pattern

Appeals process for moderation actions exists

medium

Without an appeals process, every moderation error is permanent — false positives remove legitimate content with no recourse for the affected user. DSA Art. 17 explicitly requires that platforms offer an internal complaints mechanism for moderation decisions. Beyond compliance, platforms without appeals alienate good-faith users who trigger overzealous automation, and create PR liability when high-profile wrongful removals become public. A structured appeal intake also generates data on false positive rates, which is the primary signal for calibrating automated moderation thresholds.

Why this severity: Medium because the absence of appeals causes user harm through wrongful permanent removals and creates DSA compliance exposure, but does not directly enable content attacks or data leakage.

community-moderation-safety.report-enforcement.appeals-processSee full pattern

User bans and suspensions are enforced

high

Without enforceable bans, repeat violators face no real consequence — they continue posting after receiving warnings, and the moderation system becomes a signal they can ignore. OWASP A01:2021 (Broken Access Control, CWE-285) applies when ban records exist in the database but the enforcement check is missing from content-creation and interaction endpoints. A ban table with no WHERE clause enforcement is a paper ban: the record exists, but the behavior continues. Communities with no ban enforcement experience higher recidivism from bad actors who quickly learn there is no real cost to violations.

Why this severity: High because without enforcement checks at content-creation endpoints, banned users can immediately resume posting, rendering all prior moderation actions ineffective.

community-moderation-safety.report-enforcement.user-bansSee full pattern

Spam & Bot Prevention

5 checks

CAPTCHA or rate limiting prevents bot posts

high

Without bot prevention, a single automated script can flood a community with thousands of posts per hour — overwhelming moderation queues, degrading feed quality, and inflating engagement metrics used to make product decisions. CWE-799 (Improper Control of Interaction Frequency) applies when there is no rate limiting or CAPTCHA on submission endpoints. OWASP A04:2021 (Insecure Design) frames the architectural gap: a platform designed without a bot prevention layer cannot be made safe by patching individual bugs. Bot floods are also the primary vector for coordinated inauthentic behavior, platform manipulation, and spam-link injection.

Why this severity: High because unprotected submission endpoints let bots flood content at machine speed, overwhelming moderation capacity and degrading community quality faster than humans can respond.

community-moderation-safety.spam-prevention.bot-preventionSee full pattern

Email verification required for new accounts

high

Platforms without email verification allow spam accounts to be created and operational in seconds. CWE-306 (Missing Authentication for Critical Function) applies: posting is a critical function that should require verified identity. Without a verification gate, a single attacker can automate account creation and bypass per-account rate limits by rotating through throwaway emails. Email verification also enforces a basic contact point for ban notifications, password resets, and compliance communications — platforms without it lose the ability to reach users for any purpose.

Why this severity: High because unverified accounts eliminate the primary friction against spam account creation, allowing attackers to scale abuse by simply creating new accounts when banned.

community-moderation-safety.spam-prevention.email-verificationSee full pattern

Duplicate account creation is difficult

medium

When a single IP can create unlimited accounts, bans become meaningless — a banned user creates a fresh account and resumes immediately. CWE-799 applies: without account-creation rate limiting, the enforcement system is trivially circumvented. Beyond ban evasion, unrestricted signup enables large-scale coordinated behavior: sock puppet networks, fake review farms, and artificially inflated user counts that distort analytics and business decisions. Platforms without IP-level account limits also bear the database and email delivery cost of every throwaway account created during attacks.

Why this severity: Medium because duplicate account creation directly enables ban evasion and coordinated inauthentic behavior, but requires attacker effort and does not expose existing user data.

community-moderation-safety.spam-prevention.duplicate-accountsSee full pattern

Honeypot fields catch bots

low

Submission forms without honeypot fields let cheap bot scripts flood the platform with spam accounts, fake posts, comment payloads, and referral abuse at near-zero cost. That drives moderation backlog, pollutes search and recommendation surfaces, burns database rows and email quota, and erodes trust in user-generated content. Honeypots are a low-friction first filter that stops naive scrapers before they reach rate limiters or human moderators, complementing access-control defenses.

Why this severity: Low because honeypots are a soft defense; missing them increases spam volume but does not directly breach accounts or data.

community-moderation-safety.spam-prevention.honeypotSee full pattern

New account posting is rate-limited or delayed

medium

A rate limit that applies equally to all accounts — new and established — provides no incremental protection against spam bots that register and immediately begin flooding. CWE-799 applies: the system fails to differentiate interaction frequency by account trust level. New accounts posting at the same rate as accounts with months of history is an anomaly signal that legitimate platforms exploit to catch automation. Reddit, Hacker News, and most major forums enforce reduced posting rates for new accounts specifically because this is the attack window that automated spam campaigns exploit most aggressively.

Why this severity: Medium because absent new-account rate differentiation, spam bots can reach full platform post velocity immediately after registration, maximizing the window of abuse before detection.

community-moderation-safety.spam-prevention.account-age-limitSee full pattern

Policy & Transparency

5 checks

Content policy is documented and accessible

high

Without a documented content policy, users cannot know what the platform prohibits, moderators have no authoritative standard to apply consistently, and the platform has no defensible basis for enforcement actions. DSA Art. 14 requires platforms to document and publish their terms and conditions, including content restrictions. COPPA 16 CFR 312 further mandates transparency about moderation practices for platforms serving minors. A policy vague enough to read as 'be nice' cannot support consistent enforcement and exposes the platform to claims of arbitrary or biased moderation — a reputational and legal liability.

Why this severity: High because the absence of a documented content policy creates a DSA compliance gap and eliminates the legal basis for consistent moderation enforcement.

community-moderation-safety.policy-transparency.policy-documentedSee full pattern

Moderation decisions are transparent

medium

When content disappears silently — no email, no notification, no reason given — users experience it as arbitrary punishment. DSA Art. 17 requires that platforms communicate content removal decisions to affected users with the reasons cited. Beyond compliance, transparent moderation builds user trust: users who understand why content was removed can correct behavior, appeal wrongful decisions, or accept enforcement as legitimate. Platforms that hide moderation reasoning generate more appeals and more user churn than those that explain decisions clearly.

Why this severity: Medium because silent moderation actions create DSA compliance exposure and drive user churn through perceived arbitrariness, but do not directly expose data or enable attacks.

community-moderation-safety.policy-transparency.moderation-transparencySee full pattern

Harassment and blocking features are available

high

Platforms without block and mute features leave users with no way to protect themselves from targeted harassment. CWE-285 (Improper Authorization) applies when there is no mechanism preventing a specific user from contacting or interacting with another. COPPA 16 CFR 312.4 requires that platforms collecting data from minors provide parental control mechanisms — blocking is the foundational user-level control. Without blocking, a single bad actor can repeatedly mention, message, or comment on a victim's content with no platform-enforced boundary, and moderators cannot act fast enough to prevent sustained harassment campaigns.

Why this severity: High because the absence of blocking forces victims to leave the platform to escape targeted harassment — there is no in-product option for self-protection.

community-moderation-safety.policy-transparency.user-controlsSee full pattern

Abuse reporting is anonymous and safe

medium

When a user can see who reported them, the report becomes a targeting signal for retaliation. CWE-200 (Exposure of Sensitive Information) applies: the reporter's identity is sensitive information that the system must protect. GDPR Art. 5(1)(f) requires data to be handled with appropriate security, including limiting access to only those who need it — in this case, moderators only. Platforms that expose reporter identity see reduced report rates because users fear retaliation, which directly degrades the effectiveness of the entire content moderation system.

Why this severity: Medium because reporter identity exposure enables targeted retaliation, chills future reporting, and constitutes a GDPR data minimization violation for platforms in EU jurisdictions.

community-moderation-safety.policy-transparency.anonymous-reportingSee full pattern

Moderation is consistent and fair

low

Without a moderation log, enforcement becomes arbitrary and unauditable. ISO 27001:2022 A.5.35 requires that organizations document accountability measures — a moderation log is the primary accountability artifact for content enforcement. Without it, there is no way to detect bias (one moderator removing content the others would approve), no way to identify moderator error rates, and no defensible record if a user challenges a moderation decision in court or through a DSA complaint. Platforms with no moderation audit trail also cannot train new moderators, because there is no historical record of what decisions were made and why.

Why this severity: Low because absent logging degrades moderation accountability and consistency but does not directly expose user data or enable security attacks.

community-moderation-safety.policy-transparency.consistency-fairnessSee full pattern

Ready to scan your project?

Run this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.

Open Content Moderation & Safety Audit