Background jobs have explicit retry caps

ab-000247 · ai-slop-cost-bombs.job-hygiene.background-jobs-have-retry-cap

Severity: highactive

Why it matters

Background jobs without retry caps are poison-message traps: any job that fails consistently due to a logic bug, bad data, or downstream API change will be retried indefinitely by BullMQ (default: unlimited), Agenda, and similar libraries. CWE-770 describes the outcome — unbounded retries consume Redis storage (the job stays in the failed set), worker CPU (each retry fires job processing logic), and can trigger cascading failures if the job calls external APIs. The retry storm also obscures monitoring: a dead-letter queue that fills silently hides the bug that caused the original failure, delaying the production fix.

Severity rationale

High because infinite retries from a single poison message consume continuous Redis and worker resources, masking the underlying bug indefinitely rather than surfacing it for repair.

Remediation

Set attempts and backoff at every queue.add() call site. Three attempts with exponential backoff gives transient-failure recovery without burning resources on permanent failures.

// Bad: no cap — retries forever
await queue.add('send-welcome', { userId })

// Good: 3 attempts, exponential backoff
await queue.add('send-welcome', { userId }, {
  attempts: 3,
  backoff: { type: 'exponential', delay: 2000 },
  removeOnComplete: { count: 100 },
  removeOnFail: { count: 50 },
})

Also configure the Worker's failed event listener so failures route to Sentry or your alerting stack — removeOnFail alone causes silent discard, which is the opposite of visibility.

Detection

ID: background-jobs-have-retry-cap
Severity: high
What to look for: When a background job library is in package.json dependencies (bullmq, bull, agenda, inngest, @upstash/qstash, trigger.dev), walk source files for job-creation calls: queue.add(, inngest.createFunction(, client.publish(, trigger.run(. Count all job-creation call sites and verify each includes an explicit retry cap: attempts: N, retries: N, maxAttempts: N, retryLimit: N where N is at most 5.
Pass criteria: 100% of job-creation calls have a retry cap of at most 5. Report: "X job creation calls, Y with retry caps, 0 unbounded."
Fail criteria: At least 1 job-creation call has no retry config OR has a cap exceeding 10.
Skip (N/A) when: No job library in dependencies.
Detail on fail: "1 unbounded retry: src/lib/queue.ts queue.add('process-payment', data) — no attempts cap, BullMQ default is unlimited retries; a poison message will be retried forever, eating Redis and worker resources"

Remediation: Without a retry cap, a poison message (one that always fails) gets retried infinitely. Set a cap:

// Bad: infinite retries
await queue.add('process-payment', data)

// Good: 3 retries max with exponential backoff
await queue.add('process-payment', data, {
  attempts: 3,
  backoff: { type: 'exponential', delay: 2000 },
})

External references

cwe · CWE-770 — Allocation of Resources Without Limits or Throttling
iso-25010:2011 · reliability

Taxons

cost-efficiency error-resilience

History

2026-04-18·v1.0.0·Initial import from ai-slop-cost-bombs·automated