All 23 checks with why-it-matters prose, severity, and cross-references to related audits.
Regex-only email validation lets addresses with non-existent domains into your database — contacts that will hard-bounce on every send, dragging down your sender reputation and triggering ISP throttling. CWE-20 (Improper Input Validation) applies directly: accepting structurally plausible but deliverability-dead addresses is a data integrity failure that accumulates silently. A domain with no MX record cannot receive mail; sending to it raises your bounce rate above the 2% threshold that ESP providers use to flag accounts for review or suspension. The cost is real: recovery from a suspended sender account typically requires list pruning, a warm-up period, and manual outreach to the ESP — days of lost campaign throughput.
Why this severity: Critical because a single ingest path without MX verification poisons the entire list with undeliverable addresses, compounding with each import until bounce rates trigger ESP account action.
data-quality-list-hygiene.email-validation.mx-record-verificationSee full patternDisposable email addresses (mailinator.com, 10minutemail.com, guerrillamail.com) are created solely to bypass signup requirements and are abandoned immediately after use. Contacts from these domains never open, never click, and frequently expire — sending to them raises complaint rates and depresses engagement signals that ISPs use to classify your reputation. CWE-20 applies: accepting input you know to be functionally invalid is an input validation failure. High-volume disposable-address lists also skew A/B test results, distort open-rate reporting, and inflate subscription counts that feed into billing tiers or investor metrics.
Why this severity: High because disposable addresses predictably deliver zero engagement and elevated complaints, degrading sender reputation and corrupting list-level analytics used for business decisions.
data-quality-list-hygiene.email-validation.disposable-domain-detectionSee full patternRole-based addresses (info@, admin@, support@, noreply@) route to shared inboxes staffed by multiple people, none of whom opted in personally. They generate disproportionate complaint rates because recipients feel no ownership of the subscription and report spam at 3–5× the rate of personal addresses. Campaigns sent to role addresses also score lower engagement — shared inboxes are typically managed for task completion, not reading newsletters — which trains ISP filters to classify your domain as low-value. Elevated complaint rates from these addresses can trigger CAN-SPAM enforcement scrutiny, since complaints are one of the signals ISPs forward to the FTC.
Why this severity: Medium because role addresses consistently elevate complaint rates and suppress engagement signals, indirectly degrading sender reputation rather than causing immediate bounces.
data-quality-list-hygiene.email-validation.role-address-flaggingSee full patternCatch-all domains accept email to any local-part regardless of whether the mailbox exists — `xyz1234@company.com` and `alice@company.com` both succeed MX verification. This makes MX checks alone insufficient for validating contact quality. Contacts from catch-all domains appear deliverable but frequently bounce or are read by nobody, because many catch-all configurations feed unrouted mail to a trash mailbox. Mixing catch-all addresses with fully verified contacts inflates your apparent list size while depressing engagement rates, producing misleading health metrics and campaign analytics.
Why this severity: Low because catch-all addresses are a list quality issue that degrades metrics and deliverability incrementally rather than causing immediate bounces or compliance exposure.
data-quality-list-hygiene.email-validation.catchall-domain-handlingSee full patternDNS MX lookups and external validation API calls have measurable latency — typically 50–500ms per call — and most providers impose rate limits (e.g., 100 calls/minute on free tiers). Without caching, every signup and every re-validation check triggers a live network round-trip for the same domain. At 1,000 signups/hour from domains like gmail.com or company.com, redundant lookups waste budget, add latency to the user-facing signup flow, and risk triggering rate-limit errors that silently degrade validation coverage when the lookup fails open.
Why this severity: Low because missing caching adds latency and cost rather than corrupting data, but at scale the rate-limit failures can silently disable validation for popular domains.
data-quality-list-hygiene.email-validation.validation-cacheSee full patternA basic regex like `/^[^@]+@[^@]+$/` accepts addresses that are syntactically invalid per RFC 5321 and RFC 5322: `foo@bar` (no TLD), `a..b@example.com` (consecutive dots), or `.alice@example.com` (leading dot). These addresses cannot receive email and will hard-bounce or be rejected at the MTA level, increasing your bounce rate. CWE-20 (Improper Input Validation) is the classification. Beyond deliverability, regex-only validation with no library invites injection of malformed strings that may cause downstream parsing errors in templates, CSV exports, or ESP API calls that expect well-formed RFC addresses.
Why this severity: High because bare-regex validation systematically admits syntactically invalid addresses that will hard-bounce, compounding sender reputation damage with every campaign.
data-quality-list-hygiene.email-validation.syntax-validationSee full patternEmail addresses are case-insensitive per RFC 5321, but most databases store them as case-sensitive strings. Without normalization, `Alice@Example.com` and `alice@example.com` are stored as separate contacts, breaking deduplication, suppression lookups, and unsubscribe processing. The CAN-SPAM Act (§ 5) requires that unsubscribe requests be honored within 10 business days — if the suppression lookup misses an address due to case mismatch, the sender is in violation. GDPR Art. 17 right-to-erasure requests face the same risk: a deletion keyed on the normalized form misses records stored in a different case. CWE-20 applies to the failure to normalize input before persistence.
Why this severity: Critical because case mismatch silently breaks suppression lookups and unsubscribe processing, creating direct CAN-SPAM compliance exposure and duplicate sends to the same person.
data-quality-list-hygiene.dedup-normalization.email-normalizationSee full patternGmail and many providers route `user+tag@gmail.com` to the same mailbox as `user@gmail.com`. Without stripping plus-tags before deduplication, a single subscriber can appear multiple times with different tags (`user+promo@`, `user+news@`, `user+test@`) — each receiving its own copy of every send. The duplication inflates list counts, skews per-contact engagement metrics, and in markets with per-contact billing (most ESPs), directly increases cost. More critically, a suppression or unsubscribe recorded against the base address won't suppress the tagged variants, creating compliance exposure under CAN-SPAM § 5's unsubscribe-honoring requirement.
Why this severity: Low because plus-address collisions affect a minority of subscribers, but each duplicated contact creates a gap in unsubscribe coverage that compounds across campaigns.
data-quality-list-hygiene.dedup-normalization.plus-address-normalizationSee full patternIn certain regions, Google operated `googlemail.com` as an alias for `gmail.com` — `alice@googlemail.com` and `alice@gmail.com` are the same mailbox. Without alias resolution, the same person can appear as two separate contacts, receiving duplicate sends and having their unsubscribe applied only to the alias variant. While `googlemail.com` is the most common example, additional alias relationships exist for other providers in specific markets. Duplicate contacts from unresolved aliases inflate list metrics, add send cost, and introduce suppression gaps when the unsubscribe is captured under the aliased domain.
Why this severity: Low because domain alias collisions are infrequent and affect a narrow set of known providers, creating manageable duplication rather than systematic data corruption.
data-quality-list-hygiene.dedup-normalization.domain-alias-resolutionSee full patternWhen the same email address is submitted multiple times with name variations (`Alice Smith`, `A. Smith`, `ALICE SMITH`), a naive upsert creates duplicate contact records or silently discards the newer name with no record of what was changed. Multiple records for the same email send duplicate campaigns to the same person, overcount subscribers in reporting dashboards, and create ambiguity when the contact later requests deletion under GDPR Art. 17 — which record is erased? Without a defined merge strategy, data quality degrades with every import, and the marketing list progressively diverges from reality.
Why this severity: Low because duplicate name records cause reporting inflation and confusion rather than compliance violations or deliverability damage, but unresolved merges accumulate across every import.
data-quality-list-hygiene.dedup-normalization.fuzzy-name-dedupSee full patternWhen two contact records are merged, the losing record's data — consent history, suppression state, engagement timestamps, and source attribution — is permanently destroyed if no audit trail is kept. This is a compliance gap: GDPR Art. 17 right-to-erasure requests require demonstrating that all instances of the data subject's information were found and erased. Without merge history, you cannot prove the losing record existed. CAN-SPAM § 5 and GDPR Art. 21 suppression obligations are similarly undermined — if the losing record held the unsubscribe reason and it was destroyed in a merge, you cannot prove the suppression origin when challenged.
Why this severity: High because merge operations without audit trails destroy consent and suppression evidence that is required for regulatory compliance and cannot be reconstructed after deletion.
data-quality-list-hygiene.dedup-normalization.merge-audit-trailSee full patternA contact record with no engagement timestamp is indistinguishable from one acquired yesterday versus three years ago. Without per-record age tracking, list hygiene jobs cannot identify stale contacts for re-verification, re-engagement campaigns cannot target the right cohort, and sender reputation metrics cannot explain why open rates are declining. A generic `updated_at` column conflates admin edits (which don't signal engagement) with recipient activity, making it useless for deliverability decisions. Many ESPs now enforce inactivity-based suppression at their end — without local engagement timestamps, your system cannot anticipate or mirror those suppressions.
Why this severity: High because contacts without engagement timestamps cannot be triaged for stale-list cleanup, causing disengaged addresses to accumulate on the active send list and silently drag down open rates and sender reputation.
data-quality-list-hygiene.data-decay.contact-age-trackedSee full patternEmail addresses decay at roughly 22–30% per year: people change jobs, abandon mailboxes, and forward to new addresses. Without a periodic re-verification workflow, your list accumulates stale contacts that hard-bounce, soft-bounce repeatedly, or go to spam traps — all of which damage sender reputation. CAN-SPAM § 5 does not prescribe a specific re-verification cadence, but it does require honoring suppression and maintaining deliverability good faith. Sending to addresses that have not engaged in 12+ months is a recognized bad practice that ESPs use as a criterion for account review. A weekly scheduled process that routes contacts through a re-engagement or re-verification pipeline is the industry baseline.
Why this severity: High because stale contacts accumulate indefinitely without a scheduled cleanup process, eventually producing bounce and complaint rates that trigger ESP account review or suspension.
data-quality-list-hygiene.data-decay.reverification-triggerSee full patternA binary active/suppressed flag treats a contact who opened an email yesterday identically to one who last engaged 5 months ago. Without intermediate engagement tiers, you cannot implement frequency capping for at-risk contacts, sunset programs for lapsed contacts, or segment-specific content strategies — all of which are proven tactics for preserving list health. Flat lists also make it impossible to diagnose why overall open rates are declining, because you cannot isolate the drag from a growing lapsed cohort. Sending at full frequency to all non-suppressed contacts regardless of recency is a recognized deliverability antipattern that contributes to spam trap hits as dormant mailboxes get recycled.
Why this severity: Medium because a single active/suppressed binary prevents the frequency management needed to protect sender reputation from lapsed-contact drag, indirectly accelerating deliverability decay.
data-quality-list-hygiene.data-decay.engagement-scoringSee full patternHard-deleting stale contacts destroys their suppression history. If the same email address reappears in a future import, the system treats it as a net-new contact with no prior history — re-adding it to the active list even if the person previously bounced or unsubscribed. This directly violates CAN-SPAM § 5's requirement to honor unsubscribes permanently (suppression history is the evidence of compliance). Under GDPR Art. 17, the right to erasure is separate from list removal: even when personal data is erased, a suppression record (email hash or similar) must be retained to prevent the address from being re-added in future imports.
Why this severity: Low because hard-deletion of stale contacts typically removes a small fraction of the list at a time, but each deletion permanently destroys the suppression evidence needed for regulatory compliance.
data-quality-list-hygiene.data-decay.stale-queue-not-deleteSee full patternHardcoded decay thresholds (`90`, `180`, `120` as magic numbers scattered across cron jobs, cleanup handlers, and segment queries) create three problems. First, they drift: the cron job uses 180 days while the segment query uses 90, causing inconsistent behavior that is invisible until a contact falls into the gap. Second, they cannot be tuned for list behavior without a code deployment — if you discover your audience has a 60-day engagement cycle, changing the threshold requires finding and updating every hardcoded number across multiple files. Third, they make code review harder: a reviewer seeing `120` in a WHERE clause has no context for whether that number is intentional, inherited, or stale.
Why this severity: Low because scattered thresholds cause drift and maintenance friction rather than immediate data corruption, but threshold inconsistency silently produces divergent list-state across different parts of the system.
data-quality-list-hygiene.data-decay.configurable-thresholdsSee full patternA `last_engaged_at` column that is never updated is worse than no column at all: it creates false confidence that engagement is being tracked while every contact's timestamp stays frozen at creation date. Stale-contact jobs keyed on `last_engaged_at` will then incorrectly classify all contacts as disengaged simultaneously, suppressing the entire list. The root cause is typically a webhook handler that logs events to an analytics table but does not write back to the contacts table — a common pattern in systems where the analytics pipeline was added after the contact model was built.
Why this severity: High because a non-updating engagement timestamp produces incorrect staleness classifications across the entire list, making re-verification and engagement-scoring logic unreliable for all contacts.
data-quality-list-hygiene.data-decay.last-engagement-maintainedSee full patternSending to hard-bounced addresses — permanent delivery failures where the domain does not exist or the mailbox has been closed — is one of the fastest ways to destroy sender reputation. ISPs track hard-bounce rates and use them as a primary signal for spam classification. Continued sends to bounced addresses also place you in violation of CAN-SPAM § 5, which requires honoring suppression, and trigger GDPR Art. 21 concerns when the data subject's address is clearly no longer valid. Deliverability services (Postmark, SendGrid, Mailgun) will automatically suspend accounts that exceed a ~2% hard-bounce threshold — recovery typically requires list remediation and a manual warm-up period.
Why this severity: Critical because sending to hard-bounced addresses directly raises bounce rates above ESP suppression thresholds, triggering account suspension and CAN-SPAM compliance exposure with every campaign.
data-quality-list-hygiene.suppression-bounce.hard-bounce-suppressionSee full patternSoft bounces are temporary delivery failures (mailbox full, server temporarily unavailable) that resolve on their own in most cases. Suppressing on the first soft bounce is over-aggressive and discards valid contacts experiencing transient issues. Ignoring soft bounces entirely is the opposite failure: repeated soft bounces to the same address indicate a systemic problem — the mailbox may have been abandoned, full indefinitely, or the server may be filtering your sender. Without a counter and threshold, soft-bouncing addresses accumulate undetected on the active list, raising your overall bounce rate and generating misleading delivery stats.
Why this severity: High because absent soft-bounce thresholding, both over-suppression (instant) and under-suppression (never) result in list quality degradation — one destroys valid contacts, the other accumulates chronic bouncers.
data-quality-list-hygiene.suppression-bounce.soft-bounce-thresholdSee full patternA per-campaign suppression list means an unsubscribe from Campaign A has no effect on Campaign B, C, or triggered automations. This is a direct CAN-SPAM § 5 violation: the law requires that opt-outs apply to all commercial email from the sender within 10 business days, not just the campaign the unsubscribe link appeared in. GDPR Art. 21 right to object similarly applies globally — an objection to marketing email must stop all marketing email, not one drip sequence. Per-campaign suppression is architecturally convenient (each campaign manages its own list) but legally and operationally incorrect.
Why this severity: High because per-campaign-only suppression is a structural CAN-SPAM § 5 violation that allows continued sends to unsubscribed contacts across campaigns, exposing the sender to FTC enforcement action.
data-quality-list-hygiene.suppression-bounce.global-suppression-listSee full patternA suppression list missing any of its four required sources — unsubscribes, hard bounces, spam complaints, or manual admin adds — has gaps that will cause sends to addresses that should never receive email. CAN-SPAM § 5 mandates honoring opt-outs (unsubscribes); GDPR Art. 21 covers the right to object (maps to unsubscribes and complaints); GDPR Art. 17 covers erasure (maps to manual admin removes); CASL § 6 mirrors CAN-SPAM's unsubscribe requirement. Spam complaints are the most commonly missing source: ESPs surface them via feedback loops that must be explicitly wired to write to the suppression table — they do not suppress automatically unless configured.
Why this severity: Medium because each missing suppression source creates a specific class of non-compliance (unsubscribes → CAN-SPAM; complaints → GDPR Art. 21) rather than a blanket deliverability failure, but the legal exposure is real.
data-quality-list-hygiene.suppression-bounce.suppression-sourcesSee full patternEvery email dispatch that uses a local suppression table performs a lookup against that table before sending. Without an index on the `email` column, this lookup is a sequential full-table scan. At 100,000 suppressed addresses, a sequential scan takes tens of milliseconds. At 1,000,000 entries — common for mature lists — it degrades to hundreds of milliseconds per lookup, making pre-send verification the bottleneck in campaign throughput. Bulk campaigns that send to 50,000 contacts perform 50,000 suppression lookups; at 200ms each, that is 2.8 hours of scan time before a single email is dispatched.
Why this severity: Low because an unindexed suppression table degrades performance incrementally as the list grows, rather than causing immediate correctness failures — but the degradation eventually makes pre-send verification impractical.
data-quality-list-hygiene.suppression-bounce.suppression-lookup-performanceSee full patternEmail dispatch pipelines with a queue layer (BullMQ, SQS, Celery) introduce a window between job creation and job execution. A contact who unsubscribes or receives a hard-bounce notification after a campaign job is enqueued but before the worker processes it will still receive the email if suppression is only checked at scheduling time. Under CAN-SPAM § 5, a suppression request must be honored within 10 business days — a queue delay of minutes to hours is acceptable, but a suppression check that is only performed at enqueue time can miss suppressions that arrive during that window. GDPR Art. 21 right to object has the same gap risk: an objection processed into the suppression table after the job was queued still results in a send if the worker does not re-check.
Why this severity: Medium because the queue-time suppression gap is a timing vulnerability rather than a permanent miss — it creates a window for non-compliant sends proportional to campaign queue depth and processing lag.
data-quality-list-hygiene.suppression-bounce.suppression-at-queue-timeSee full patternRun this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.
Open Data Quality & List Hygiene Audit