Profanity and abuse keywords are filtered

ab-000694 · community-moderation-safety.content-filtering.profanity-filtering

Severity: lowactive

Why it matters

Unfiltered user-submitted content creates a hostile environment that drives away legitimate users and exposes the platform to liability under CWE-20 (Improper Input Validation). Without server-side profanity and abuse keyword filtering, a single bad actor can flood posts, comments, and user bios with slurs or harassment before any human moderator intervenes. Client-only filtering is bypassed trivially via direct API calls, making it security theater. Platforms without this control also struggle to satisfy app store moderation requirements and advertiser brand-safety standards.

Severity rationale

Low because abusive language harms community health and user retention but does not directly expose credentials, enable account takeover, or leak sensitive data.

Remediation

Add server-side profanity filtering in your content-creation middleware before any database write. Install better-profanity and apply it at the API layer, not the client:

import { profanity } from 'better-profanity';

export async function createComment(content, userId) {
  if (profanity.exists(content)) {
    throw new Error('Comment contains prohibited language');
  }
  await db.comments.create({ content, userId, createdAt: new Date() });
}

For higher accuracy on edge cases, chain this with a toxicity scoring API (Perspective API, OpenAI Moderation) so keyword misses are caught by ML scoring.

Detection

ID: profanity-filtering
Severity: low
What to look for: Check if content submitted by users (posts, comments, messages) goes through any filtering before storage. Look for keyword filtering libraries (better-profanity, bad-words, custom keyword lists), or API calls to content moderation services. Verify that filtering is applied server-side before content is saved to the database.
Pass criteria: Submitted content is checked against a profanity/abuse keyword list or third-party moderation API before storage. Enumerate all content-creation endpoints (posts, comments, messages, bios) and confirm that at least 90% route through the filter. Filtered content is either rejected before save, or flagged for review. Server-side validation is present. On pass, report the count of filtered endpoints vs. total content-creation endpoints.
Fail criteria: No profanity or abuse keyword filtering is implemented, or filtering is only client-side (which can be bypassed). A filter that covers fewer than 90% of content-creation endpoints does not count as pass.
Skip (N/A) when: Platform has <500 active users AND uses manual-only moderation with no automated filtering.
Detail on fail: "No profanity filtering detected. User-submitted content is not checked for abusive language before storage."
Cross-reference: Compare with community-moderation-safety.content-filtering.toxicity-scoring — profanity filtering and toxicity scoring should share a common content pipeline.

Remediation: Implement keyword-based filtering using a library like better-profanity on the backend. Always filter server-side before saving content to the database.

import Profanity from 'better-profanity';

const profanity = new Profanity({ emptyList: false });

async function submitComment(content, userId) {
  // Check for profanity
  const cleaned = profanity.clean(content);
  if (cleaned !== content) {
    // Option 1: Reject the submission
    throw new Error('Comment contains prohibited language');

    // Option 2: Flag for review
    // await db.flaggedContent.create({
    //   content,
    //   userId,
    //   reason: 'Profanity detected',
    //   status: 'pending'
    // });
    // return { success: false, message: 'Your comment was flagged for review' };
  }

  // Save comment if it passes filtering
  await db.comments.create({ content, userId, createdAt: new Date() });
}

For higher accuracy, integrate with third-party APIs like Perspective API or OpenAI Moderation to score toxicity levels.

External references

cwe · CWE-20 — Improper Input Validation

Taxons

injection-and-input-trust content-integrity

History

2026-04-18·v1.0.0·Initial import from community-moderation-safety·automated