Capacity alerts at 80% threshold
Why it matters
Capacity alerts at failure (queue full, ESP quota exceeded) give operators zero time to react — sends are already failing when the alert fires. An 80% threshold alert gives operators time to scale workers, contact the ESP about a quota increase, or delay a non-urgent campaign before the limit is hit. ISO 25010 reliability.operability requires that the system communicates approaching resource exhaustion, not just completed resource exhaustion. The Monitoring & Alerting category checks in this bundle verify that signals are being measured — this check verifies that pre-warning thresholds are configured on those signals.
Severity rationale
Low because the gap between no pre-warning and pre-warning alerts is operational comfort rather than correctness — sends degrade at 100% regardless, but operators with early warning can prevent reaching 100%.
Remediation
Add a threshold check in the queue monitoring poller for at least one capacity metric — daily send quota is the most common:
const DAILY_QUOTA = 100_000
const sentToday = await countSentToday()
if (sentToday > DAILY_QUOTA * 0.8) {
await alertOps(
`Daily send quota at ${Math.round(sentToday / DAILY_QUOTA * 100)}% — ${DAILY_QUOTA - sentToday} sends remaining`,
'warning'
)
}
The threshold value must be explicitly defined in code or configuration — a magic number with no comment explaining it is the limit does not satisfy this check.
Detection
-
ID:
capacity-alerts-at-threshold -
Severity:
low -
What to look for: Enumerate all capacity metrics that have pre-warning alert thresholds: daily send quota, queue worker memory limit, connection pool. Count the number of metrics with a threshold set at or below 80% of their configured limit. Check alerting configuration files, monitoring platform rules, or custom threshold checks in worker bootstrap code.
-
Pass criteria: At least 1 capacity metric (send quota, memory, connections) has an alert threshold set at or below 80% of the configured limit. The threshold value must be explicitly defined in code or configuration — not hardcoded as a magic number without documentation. Report the count even on pass: "N capacity metrics with pre-warning alerts."
-
Fail criteria: Alerts only fire at failure (queue worker crashes, ESP quota exceeded) with no advance warning at 80% threshold. Or thresholds exist but are set above 90%.
-
Skip (N/A) when: No capacity metrics are tracked (in which case the Monitoring & Alerting category checks capture this gap) — confirmed by the absence of metrics emission code.
-
Detail on fail:
"No capacity pre-warning alerts configured — operator would only know quota is exhausted after sends start failing"or"Queue depth alert fires at 100% (full) — no early warning at 80%" -
Remediation: Add a threshold alert in the queue monitoring poller:
const DAILY_QUOTA = 100_000 const sentToday = await countSentToday() if (sentToday > DAILY_QUOTA * 0.8) { await alertOps(`Daily send quota at ${Math.round(sentToday / DAILY_QUOTA * 100)}% — ${DAILY_QUOTA - sentToday} sends remaining`, 'warning') }
External references
- iso-25010:2011 · reliability.operability — Reliability / Operability — pre-warning at 80% threshold enables proactive capacity management
Taxons
History
- 2026-04-18·v1.0.0·Initial import from operational-resilience-email·automated