Exponential backoff slows down retry frequency, but it does not stop new jobs from being accepted and attempted against a broken ESP. A circuit breaker adds the missing control: after N consecutive failures, it stops attempting sends entirely and waits for a cooldown before probing again. Without it, a prolonged ESP outage fills the queue with failed jobs that burn retry budget. CWE-400 applies because new jobs continue to consume processing resources against an endpoint that is confirmed unavailable. The circuit breaker is the difference between "we paused sends during the outage" and "we burned through all retry attempts before the outage resolved."
Low because exponential backoff provides partial protection, but without a circuit breaker, indefinite new-job ingestion during an outage still exhausts retry budget and worker capacity.
Wrap the ESP send call with a circuit breaker using a library like opossum in your send adapter (e.g., src/lib/email/providers/sendgrid.ts):
import CircuitBreaker from 'opossum'
const breaker = new CircuitBreaker(esp.send.bind(esp), {
timeout: 5000,
errorThresholdPercentage: 50,
resetTimeout: 60_000 // 1 minute cooldown
})
breaker.on('open', () => {
alertOps('ESP circuit breaker opened — sends paused for 60s', 'critical')
})
export const sendEmail = (msg: EmailMessage) => breaker.fire(msg)
This check may be skipped when ab-001981 (ESP failover) is already passing, as immediate failover to a secondary fills the circuit breaker role.
ID: operational-resilience-email.failure-recovery.circuit-breaker-on-esp-failures
Severity: low
What to look for: List all circuit breaker implementations around ESP API calls — either a library like opossum or cockatiel, or a custom implementation that tracks consecutive failure count and stops attempting sends after N failures, reopening after a cooldown. Count the number of ESP call sites protected by a circuit breaker. The circuit breaker prevents the queue from endlessly attempting sends against a broken ESP while burning through retries.
Pass criteria: ESP calls are wrapped in a circuit breaker that opens after no more than 10 consecutive failures and does not attempt new sends until a cooldown period of at least 30 seconds expires. The breaker must emit an alert or log when it opens. Report even on pass: "Circuit breaker threshold: [N] failures, cooldown: [M] seconds."
Fail criteria: No circuit breaker. ESP failures are handled purely through retry backoff, but the worker will keep attempting indefinitely as new jobs arrive.
Skip (N/A) when: The project has ESP failover that immediately switches to a secondary on any failure — confirmed by failover logic in the send path that fills the circuit breaker role.
Detail on fail: "No circuit breaker around ESP calls — consecutive failures will be retried indefinitely as new sends arrive" or "Backoff is configured but no circuit breaker — the queue will keep accepting and failing new jobs during a prolonged ESP outage"
Remediation: Implement a simple circuit breaker:
import CircuitBreaker from 'opossum'
const breaker = new CircuitBreaker(esp.send.bind(esp), {
timeout: 5000,
errorThresholdPercentage: 50,
resetTimeout: 60_000 // 1 minute cooldown
})
breaker.on('open', () => {
alertOps('ESP circuit breaker opened — sends paused for 60s', 'critical')
})
export const sendEmail = (msg: EmailMessage) => breaker.fire(msg)