Alert channels configured beyond logging

ab-001979 · operational-resilience-email.monitoring-alerting.alert-channels-configured

Severity: highactive

Why it matters

Log-only alerting means a human must be watching logs at the exact moment an email system failure occurs. A Slack webhook or PagerDuty alert configured for queue backlog and ESP failure conditions changes the detection model from passive (someone notices) to active (system notifies). ISO 25010 reliability.operability requires that operational problems are communicated to operators without manual monitoring. Teams that rely on log scraping during on-call hours routinely miss email incidents for 30–90 minutes — long enough for complaint rates to spike above Gmail's 0.1% threshold.

Severity rationale

High because log-only alerting guarantees delayed incident response — production email failures go undetected until a human happens to look at logs or users report problems.

Remediation

Wire push notifications to at least two alert conditions in a shared utility (e.g., src/lib/alerts.ts):

async function alertOps(message: string, severity: 'warning' | 'critical') {
  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text: `[${severity.toUpperCase()}] Email system: ${message}` })
  })
}

Call alertOps() on ESP failover events, bounce rate threshold breaches, and queue depth spikes. A Slack SDK installed but never called on error conditions does not satisfy this check.

Detection

ID: alert-channels-configured
Severity: high
What to look for: Enumerate all alert channel integrations in the codebase — Slack webhooks, PagerDuty API calls, OpsGenie, email alerting, SMS via Twilio, or any push notification mechanism. For each channel found, verify it fires on at least 1 email system error condition (queue backlog, bounce rate spike, ESP failure). Count the number of distinct alert conditions wired to push channels.
Pass criteria: At least 1 external notification channel (Slack, PagerDuty, OpsGenie, email, SMS) is wired to fire on at least 2 defined alert conditions for the email system (e.g., queue backlog and ESP failure). Report even on pass: list each channel and the conditions it covers.
Fail criteria: All alerting is log-based only. No push notifications are sent when the email system hits an error condition. Or a channel is configured but not wired to any email system alert condition.
Skip (N/A) when: The project is a personal side project with a single operator who monitors logs directly — confirmed by a solo contributor history and no team-facing infrastructure.
Detail on fail: "No alert channels found — all failures are logged to stdout with no push notification" or "Slack SDK is installed but no webhook calls are made on queue or ESP error conditions"

Remediation: Wire an alert on critical email system events in a shared utility (e.g., src/lib/alerts.ts):

async function alertOps(message: string, severity: 'warning' | 'critical') {
  await fetch(process.env.SLACK_WEBHOOK_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ text: `[${severity.toUpperCase()}] Email system: ${message}` })
  })
}

Call alertOps() on ESP failover events, bounce rate threshold breaches, and queue depth spikes.

External references

iso-25010:2011 · reliability.operability — Reliability / Operability — push alert channels required beyond log-only notifications

Taxons

observability

History

2026-04-18·v1.0.0·Initial import from operational-resilience-email·automated