ESP response codes mapped to internal status model

ab-002464 · sending-pipeline-infrastructure.esp-integration.response-code-mapping

Severity: mediumactive

Why it matters

When a worker catches all ESP errors with a single catch block and re-throws them into the retry pipeline, invalid email addresses are retried five times before being discarded, burning send quota and generating five bounce events against your domain reputation. SendGrid and Mailgun rate response codes differently: a 400 (invalid address) is permanent; a 503 (service unavailable) is transient; a 429 (rate limit) needs a long backoff. Treating all three identically is the mechanism behind CWE-390 (detection of errors without action) and CWE-400 (resource exhaustion from unnecessary retry).

Severity rationale

Medium because misclassified errors waste send quota and degrade sender reputation rather than causing immediate data loss or security breach.

Remediation

Create a classifyEspError function in lib/email/errors.ts and use it in the worker to route each failure class to the correct outcome:

export type EmailFailureKind = 'permanent' | 'transient' | 'rate_limited' | 'quota_exceeded'

export function classifyEspError(err: unknown): EmailFailureKind {
  const code = (err as { code?: number; statusCode?: number }).statusCode
  if (code === 429) return 'rate_limited'
  if (code && code >= 400 && code < 500) return 'permanent'
  if (code && code >= 500) return 'transient'
  return 'transient'
}

In the job processor, return (acknowledge without retry) on permanent and re-throw on transient so BullMQ applies its backoff schedule.

Detection

ID: response-code-mapping
Severity: medium
What to look for: Examine the ESP error handling in the adapter or send path. Check whether ESP-specific error codes and HTTP status codes are translated into internal status values (e.g., permanent_failure, temporary_failure, rate_limited, invalid_recipient). Look for raw ESP exceptions being re-thrown without interpretation, or all errors treated identically (e.g., all failures trigger the same generic retry).
Pass criteria: ESP errors are caught and mapped to at least 3 distinct internal status categories (e.g., permanent, transient, rate_limited). Permanent failures (invalid address, unsubscribed) are not retried. Rate limit errors use backoff of at least 5 seconds. Enumerate all error categories defined and count them.
Fail criteria: All ESP errors trigger the same generic retry logic regardless of whether the failure is permanent or temporary. Invalid address errors are retried until max attempts, wasting quota. Rate limit errors cause immediate retry rather than backing off. Or fewer than 3 error categories are defined.
Skip (N/A) when: The application delegates all error classification to the ESP's built-in retry handling — confirmed by explicit delegation in the queue configuration.
Detail on fail: "All SendGrid errors caught with a single try/catch and retried unconditionally — invalid email addresses retried 5 times before being discarded" or "No distinction between 4xx permanent failures and 5xx transient errors in ESP response handler"

Remediation: Map ESP response codes to internal categories and handle each appropriately:

// lib/email/errors.ts
export type EmailFailureKind =
  | 'permanent'      // Do not retry: bad address, unsubscribed, blocked domain
  | 'transient'      // Retry with backoff: network issue, ESP 5xx
  | 'rate_limited'   // Retry after delay: 429 from ESP
  | 'quota_exceeded' // Alert ops, pause queue: daily send limit hit

export function classifyEspError(err: unknown): EmailFailureKind {
  const code = (err as { code?: number; statusCode?: number }).code
    ?? (err as { code?: number; statusCode?: number }).statusCode

  if (code === 429) return 'rate_limited'
  if (code && code >= 400 && code < 500) return 'permanent'
  if (code && code >= 500) return 'transient'
  return 'transient' // Unknown — assume transient and retry
}

In the worker, use the classification:

try {
  await esp.send(message)
} catch (err) {
  const kind = classifyEspError(err)
  if (kind === 'permanent') {
    await markRecipientUndeliverable(recipientId, err)
    return // Acknowledge job — do not retry
  }
  throw err // Re-throw for BullMQ retry logic
}

External references

cwe · CWE-390 — Detection of Error Condition Without Action
iso-25010:2011 · reliability.fault-tolerance — Fault Tolerance
cwe · CWE-400 — Uncontrolled Resource Consumption

Taxons

error-resilience cost-efficiency

History

2026-04-18·v1.0.0·Initial import from sending-pipeline-infrastructure·automated