Blast radius containment — pause affected campaigns only
Why it matters
When a campaign starts generating complaint spikes, the correct response is to pause that campaign — not halt the entire email system. A queue worker that can only be stopped globally means that pausing one broken campaign takes down order confirmations, password resets, and MFA codes simultaneously. NIST SP 800-53 IR-4 (Incident Handling) requires that incident containment minimizes collateral impact. The Campaign Orchestration & Sequencing Audit verifies sequence management; this check verifies that the pause mechanism is fine-grained enough to contain an incident without collateral damage.
Severity rationale
High because the absence of per-campaign pause forces operators to choose between continuing a damaging campaign or halting all email — including time-critical transactional sends.
Remediation
Add a campaign status check at the start of each job in the queue worker, before any send logic runs:
async function processEmailJob(job: Job) {
const campaign = await db.campaign.findUnique({ where: { id: job.data.campaignId } })
if (campaign?.status === 'paused') {
return { skipped: true, reason: 'campaign paused' }
}
// proceed with send
}
Wire a PATCH /api/campaigns/:id endpoint that sets status: 'paused' without requiring a code deploy. A pause mechanism that requires a redeploy or infrastructure restart does not satisfy this check.
Detection
-
ID:
blast-radius-containment -
Severity:
high -
What to look for: Enumerate all campaign-level control mechanisms: a
pausedorstatusflag on individual campaign records, an API endpoint or admin action that sets this flag, and queue worker logic that checks the flag before processing a job. Count the number of these 3 components present. The Campaign Orchestration & Sequencing Audit verifies sequence management — this check verifies that incident response does not require a full system halt. -
Pass criteria: A campaign can be paused individually without affecting other campaigns in under 60 seconds. The pause mechanism is operable without a code deploy (e.g., a database flag checked by the worker, admin UI action, or API call). The worker must check the campaign status before processing each job — not after. Do NOT pass when the only pause mechanism requires a code deploy or infrastructure restart.
-
Fail criteria: The only way to stop a problematic campaign is to halt the entire queue worker or stop all email sending. No per-campaign pause mechanism exists. Or the pause requires a code change and redeploy.
-
Skip (N/A) when: The system only ever runs one campaign at a time and a full pause is equivalent — confirmed by the application architecture.
-
Detail on fail:
"No per-campaign pause flag — stopping one campaign requires halting all queue workers"or"Campaign status is managed in code only — a pause requires a code change and redeploy" -
Remediation: Add a status check in the queue worker:
async function processEmailJob(job: Job) { const campaign = await db.campaign.findUnique({ where: { id: job.data.campaignId } }) if (campaign?.status === 'paused') { // Return without processing — job will be retried later or discarded return { skipped: true, reason: 'campaign paused' } } // proceed with send }Wire a
PATCH /api/campaigns/:idendpoint that allows settingstatus: 'paused'without a deploy.
External references
- iso-25010:2011 · reliability.fault-tolerance — Reliability / Fault Tolerance — blast radius containment limits incident impact to affected campaigns only
- nist:rev5 · IR-4 — NIST 800-53 IR-4: Incident Handling
Taxons
History
- 2026-04-18·v1.0.0·Initial import from operational-resilience-email·automated