Email syntax validation beyond basic regex
Why it matters
A basic regex like /^[^@]+@[^@]+$/ accepts addresses that are syntactically invalid per RFC 5321 and RFC 5322: foo@bar (no TLD), a..b@example.com (consecutive dots), or .alice@example.com (leading dot). These addresses cannot receive email and will hard-bounce or be rejected at the MTA level, increasing your bounce rate. CWE-20 (Improper Input Validation) is the classification. Beyond deliverability, regex-only validation with no library invites injection of malformed strings that may cause downstream parsing errors in templates, CSV exports, or ESP API calls that expect well-formed RFC addresses.
Severity rationale
High because bare-regex validation systematically admits syntactically invalid addresses that will hard-bounce, compounding sender reputation damage with every campaign.
Remediation
Replace bare regex with a validation library that enforces RFC 5321 rules. Zod's z.string().email() covers the common cases; isemail with strict mode covers edge cases:
import { z } from 'zod'
const emailSchema = z.string().email()
const parsed = emailSchema.safeParse(rawEmail)
if (!parsed.success) {
return res.status(422).json({ error: 'Invalid email address format' })
}
// Or, for strict RFC 5321 compliance:
import { isEmail } from 'validator'
if (!isEmail(rawEmail, { allow_utf8_local_part: false, require_tld: true })) {
return res.status(422).json({ error: 'Invalid email address format' })
}
Apply at every ingest path (signup form handler, API endpoint, CSV import processor). Syntax validation runs synchronously and adds negligible latency — there is no reason to skip it.
Detection
-
ID:
syntax-validation -
Severity:
high -
What to look for: Enumerate all email ingest paths and for each, extract and quote the validation function or regex pattern used. A basic regex like
/^[^@]+@[^@]+$/will pass addresses likefoo@barora@b.cthat are technically syntactically invalid per RFC 5321. Look for a validation library (validator.jsisEmail(), theisemailpackage, or Zod'sz.string().email()) that enforces proper TLD presence, no consecutive dots, no leading/trailing dots in local-part, etc. A basic regex with no TLD enforcement does not count as pass. -
Pass criteria: At least 1 well-maintained validation library is used that enforces RFC 5321 / RFC 5322 rules beyond basic format matching. Quote the library name and import statement found.
-
Fail criteria: Only a basic regex is used, or no syntax validation exists before storage.
-
Skip (N/A) when: Email addresses are sourced exclusively from an OAuth provider that already validates them (and the raw user-supplied email string is never stored directly).
-
Detail on fail: Example:
"Email validation uses /[^@]+@[^@]+/ regex — allows addresses without valid TLDs or with consecutive dots"or"No email syntax validation in ingest handler" -
Remediation: Use a proper validation library:
import { z } from 'zod' const emailSchema = z.string().email() // Or with validator.js for more control: import { isEmail } from 'validator' if (!isEmail(rawEmail, { allow_utf8_local_part: false, require_tld: true })) { return res.status(422).json({ error: 'Invalid email address format' }) }Note: Zod's
z.string().email()uses a pragmatic RFC-aligned pattern. For strict RFC 5321 compliance, useisemailwith strict mode enabled.
External references
- cwe · CWE-20 — Improper Input Validation
- iso-25010:2011 · functional-correctness — Functional Correctness (functional suitability)
Taxons
History
- 2026-04-18·v1.0.0·Initial import from data-quality-list-hygiene·automated