Capacity alerts at failure (queue full, ESP quota exceeded) give operators zero time to react — sends are already failing when the alert fires. An 80% threshold alert gives operators time to scale workers, contact the ESP about a quota increase, or delay a non-urgent campaign before the limit is hit. ISO 25010 reliability.operability requires that the system communicates approaching resource exhaustion, not just completed resource exhaustion. The Monitoring & Alerting category checks in this bundle verify that signals are being measured — this check verifies that pre-warning thresholds are configured on those signals.
Low because the gap between no pre-warning and pre-warning alerts is operational comfort rather than correctness — sends degrade at 100% regardless, but operators with early warning can prevent reaching 100%.
Add a threshold check in the queue monitoring poller for at least one capacity metric — daily send quota is the most common:
const DAILY_QUOTA = 100_000
const sentToday = await countSentToday()
if (sentToday > DAILY_QUOTA * 0.8) {
await alertOps(
`Daily send quota at ${Math.round(sentToday / DAILY_QUOTA * 100)}% — ${DAILY_QUOTA - sentToday} sends remaining`,
'warning'
)
}
The threshold value must be explicitly defined in code or configuration — a magic number with no comment explaining it is the limit does not satisfy this check.
ID: operational-resilience-email.capacity-scaling.capacity-alerts-at-threshold
Severity: low
What to look for: Enumerate all capacity metrics that have pre-warning alert thresholds: daily send quota, queue worker memory limit, connection pool. Count the number of metrics with a threshold set at or below 80% of their configured limit. Check alerting configuration files, monitoring platform rules, or custom threshold checks in worker bootstrap code.
Pass criteria: At least 1 capacity metric (send quota, memory, connections) has an alert threshold set at or below 80% of the configured limit. The threshold value must be explicitly defined in code or configuration — not hardcoded as a magic number without documentation. Report the count even on pass: "N capacity metrics with pre-warning alerts."
Fail criteria: Alerts only fire at failure (queue worker crashes, ESP quota exceeded) with no advance warning at 80% threshold. Or thresholds exist but are set above 90%.
Skip (N/A) when: No capacity metrics are tracked (in which case the Monitoring & Alerting category checks capture this gap) — confirmed by the absence of metrics emission code.
Detail on fail: "No capacity pre-warning alerts configured — operator would only know quota is exhausted after sends start failing" or "Queue depth alert fires at 100% (full) — no early warning at 80%"
Remediation: Add a threshold alert in the queue monitoring poller:
const DAILY_QUOTA = 100_000
const sentToday = await countSentToday()
if (sentToday > DAILY_QUOTA * 0.8) {
await alertOps(`Daily send quota at ${Math.round(sentToday / DAILY_QUOTA * 100)}% — ${DAILY_QUOTA - sentToday} sends remaining`, 'warning')
}