Capacity alerts at 80% threshold

ab-001990 · operational-resilience-email.capacity-scaling.capacity-alerts-at-threshold

Severity: lowactive

Why it matters

Capacity alerts at failure (queue full, ESP quota exceeded) give operators zero time to react — sends are already failing when the alert fires. An 80% threshold alert gives operators time to scale workers, contact the ESP about a quota increase, or delay a non-urgent campaign before the limit is hit. ISO 25010 reliability.operability requires that the system communicates approaching resource exhaustion, not just completed resource exhaustion. The Monitoring & Alerting category checks in this bundle verify that signals are being measured — this check verifies that pre-warning thresholds are configured on those signals.

Severity rationale

Low because the gap between no pre-warning and pre-warning alerts is operational comfort rather than correctness — sends degrade at 100% regardless, but operators with early warning can prevent reaching 100%.

Remediation

Add a threshold check in the queue monitoring poller for at least one capacity metric — daily send quota is the most common:

const DAILY_QUOTA = 100_000
const sentToday = await countSentToday()
if (sentToday > DAILY_QUOTA * 0.8) {
  await alertOps(
    `Daily send quota at ${Math.round(sentToday / DAILY_QUOTA * 100)}% — ${DAILY_QUOTA - sentToday} sends remaining`,
    'warning'
  )
}

The threshold value must be explicitly defined in code or configuration — a magic number with no comment explaining it is the limit does not satisfy this check.

Detection

ID: capacity-alerts-at-threshold
Severity: low
What to look for: Enumerate all capacity metrics that have pre-warning alert thresholds: daily send quota, queue worker memory limit, connection pool. Count the number of metrics with a threshold set at or below 80% of their configured limit. Check alerting configuration files, monitoring platform rules, or custom threshold checks in worker bootstrap code.
Pass criteria: At least 1 capacity metric (send quota, memory, connections) has an alert threshold set at or below 80% of the configured limit. The threshold value must be explicitly defined in code or configuration — not hardcoded as a magic number without documentation. Report the count even on pass: "N capacity metrics with pre-warning alerts."
Fail criteria: Alerts only fire at failure (queue worker crashes, ESP quota exceeded) with no advance warning at 80% threshold. Or thresholds exist but are set above 90%.
Skip (N/A) when: No capacity metrics are tracked (in which case the Monitoring & Alerting category checks capture this gap) — confirmed by the absence of metrics emission code.
Detail on fail: "No capacity pre-warning alerts configured — operator would only know quota is exhausted after sends start failing" or "Queue depth alert fires at 100% (full) — no early warning at 80%"

Remediation: Add a threshold alert in the queue monitoring poller:

const DAILY_QUOTA = 100_000
const sentToday = await countSentToday()
if (sentToday > DAILY_QUOTA * 0.8) {
  await alertOps(`Daily send quota at ${Math.round(sentToday / DAILY_QUOTA * 100)}% — ${DAILY_QUOTA - sentToday} sends remaining`, 'warning')
}

External references

iso-25010:2011 · reliability.operability — Reliability / Operability — pre-warning at 80% threshold enables proactive capacity management

Taxons

observability

History

2026-04-18·v1.0.0·Initial import from operational-resilience-email·automated