A deliverability drop — bounce rate spike, complaint rate above Gmail's 0.1% threshold, or inbox placement collapse — requires immediate, structured triage. Without a runbook, operators improvise: they may pause all sends instead of isolating the affected campaign, spend 30 minutes finding the right dashboard query, or miss the SPF/DKIM check that would have identified the root cause in 2 minutes. NIST SP 800-53 IR-8 (Incident Response Plan) requires documented procedures for anticipated failure modes. RFC 5321 bounce handling standards define the specific events this runbook must cover.
High because an undocumented deliverability incident response guarantees slower triage, which allows complaint rates to climb further past ESP thresholds during the response window.
Create docs/runbooks/deliverability-incident.md with at least four sections: detection thresholds, triage checklist, immediate actions, and escalation contacts:
# Deliverability Incident Runbook
## Detection
- Bounce rate exceeds 5% over 1-hour window
- Complaint rate above 0.1% (Gmail threshold)
## Triage
1. Check bounce breakdown: `SELECT domain, count(*) FROM bounces WHERE ...`
2. Review Postmaster Tools for Gmail/Yahoo reputation
## Immediate Actions
- Pause affected campaign via admin API
- Verify DNS records (SPF, DKIM, DMARC)
## Escalation
- On-call: [name] via PagerDuty
- ESP support: [contact info]
The runbook must reference at least one numeric threshold — "bounce rate exceeds 5%" — to avoid the generic "contact ops" anti-pattern.
ID: operational-resilience-email.incident-response.deliverability-drop-runbook
Severity: high
What to look for: Look for a runbook document (in docs/, runbooks/, or a wiki link referenced in code) that describes step-by-step how to respond when deliverability drops — bounce rate spikes, complaint rate rises above the Gmail 0.1% threshold, or open rates collapse. The runbook should cover: how to detect the drop, initial triage steps (which campaigns, which IP, which domain), escalation path, and resolution actions (pause campaigns, check DNS, contact ESP support). This complements the Deliverability Engineering Audit's DNS monitoring requirements.
Pass criteria: A deliverability incident runbook exists with at least 4 sections: detection steps, triage checklist, campaign pause procedure, and escalation contacts. Enumerate all sections present in the runbook. The runbook must include at least 1 specific threshold (e.g., "bounce rate exceeds 5%", "complaint rate above 0.1%"). Is NOT a pass when only a generic "escalate to ops" note exists with no specific triage steps.
Fail criteria: No deliverability runbook exists. Or only a generic "escalate to ops" note with no specific triage steps. Or the runbook has fewer than 4 sections.
Skip (N/A) when: The project sends no email — confirmed by the absence of any ESP SDK in package.json.
Detail on fail: "No deliverability incident runbook found — operators would need to improvise triage during an active incident" or "README mentions 'contact ESP support' but has no structured triage steps before escalation"
Remediation: Create docs/runbooks/deliverability-incident.md with sections for:
# Deliverability Incident Runbook
## Detection
- Bounce rate exceeds 5% over 1-hour window
- Complaint rate above 0.1% (Gmail threshold)
## Triage
1. Check bounce breakdown: `SELECT domain, count(*) FROM bounces WHERE ...`
2. Review Postmaster Tools for Gmail/Yahoo reputation
## Immediate Actions
- Pause affected campaign via admin API
- Verify DNS records (SPF, DKIM, DMARC)
## Escalation
- On-call: [name] via PagerDuty
- ESP support: [contact info]