Skip to main content

Queue workers scale horizontally

ab-001987 · operational-resilience-email.capacity-scaling.workers-scale-horizontally
Severity: lowactive

Why it matters

A single-instance queue worker is a capacity ceiling and a single point of failure. When send volume spikes — product launch, re-engagement campaign — a single worker becomes the bottleneck. ISO 25010 performance-efficiency.capacity requires that the system can be scaled to meet demand. The second failure mode is subtler: in-process state (a local Map of sent IDs, a module-level cache) will cause duplicate sends or inconsistent behavior when multiple instances run, which is worse than not scaling at all.

Severity rationale

Low because single-instance workers are a capacity ceiling, not an immediate failure — the impact is gradual throughput degradation during volume spikes rather than an outright outage.

Remediation

Configure horizontal scaling in your deployment manifest and ensure worker code is stateless. For Docker Compose:

services:
  email-worker:
    image: myapp/worker
    deploy:
      replicas: 3
    environment:
      - REDIS_URL=${REDIS_URL}

Move all shared state — deduplication sets, rate limit counters, sent-ID tracking — out of process memory and into Redis or the database. Any in-process singleton that would conflict across worker instances causes this check to fail, regardless of replica count.

Detection

  • ID: operational-resilience-email.capacity-scaling.workers-scale-horizontally

  • Severity: low

  • What to look for: Check whether queue worker deployment configuration allows running multiple instances: container orchestration (Docker Compose with replicas, Kubernetes Deployment replicas, Railway/Fly.io instance count), or process manager configuration (PM2 cluster mode). The worker code itself should also be stateless — no in-process state that would conflict across instances.

  • Pass criteria: Worker deployment configuration supports at least 2 concurrent instances (Docker replicas, Kubernetes replicas, PM2 cluster mode). Worker code is stateless — no in-process singleton state that would conflict across instances. Count all in-process state patterns (local Maps, module-level caches, in-memory counters) — there must be 0.

  • Fail criteria: Workers are deployed as a single-instance process with no scaling configuration. Or worker code holds in-process state (e.g., a local map of sent IDs) that breaks with multiple instances.

  • Skip (N/A) when: The project processes email volume that a single worker can handle indefinitely — documented rationale exists in code comments or README.

  • Detail on fail: "Worker is deployed as a single process with no horizontal scaling configuration" or "Worker uses in-process sent-ID cache that would cause duplicate sends if run as multiple instances"

  • Remediation: For Docker Compose:

    services:
      email-worker:
        image: myapp/worker
        deploy:
          replicas: 3
        environment:
          - REDIS_URL=${REDIS_URL}
    

    Ensure all shared state (deduplication, rate limiting) lives in Redis or the database, not in process memory.

External references

Taxons

History