Dead source detection with alerting

ab-000879 · data-sourcing-provenance.source-management.dead-source-alerting

Severity: lowactive

Why it matters

A scraper or API integration that stops producing records with no alert will silently starve your pipeline for days. You will not know until someone notices list growth has plateaued or a manual check reveals the last ingest timestamp is a week old. NIST CSF 2.0 DE.CM-1 requires continuous monitoring of system components, which extends to data ingestion pipelines. Without dead-source detection, outages in data acquisition are invisible until they become business-impacting.

Severity rationale

Low because silent failures affect future data acquisition rather than compromising existing records, but they can cause significant operational and business impact over time.

Remediation

Add a scheduled health-check job that queries the maximum acquired_at per source and alerts when any source exceeds its expected silence threshold.

// cron: every hour via cron job or Inngest function
async function checkSourceHealth() {
  const stale = await db.query(`
    SELECT source_id, MAX(acquired_at) AS last_seen
    FROM contacts
    GROUP BY source_id
    HAVING MAX(acquired_at) < NOW() - INTERVAL '24 hours'
  `)
  if (stale.rows.length > 0) {
    await sendAlert('Dead data sources detected', stale.rows)
  }
}

Set the silence threshold per source type — scraping sources may warrant a 6-hour threshold, purchased-list imports a 7-day threshold.

Detection

ID: dead-source-alerting
Severity: low
What to look for: Count all monitoring or health-check mechanisms that detect when a data source has gone silent — i.e., it normally delivers contacts but has produced zero records for an unexpectedly long period. This might be: a cron job that checks last-ingestion timestamp per source, an alerting rule based on ingestion volume metrics, or a health endpoint that surfaces per-source staleness.
Pass criteria: At least 1 mechanism exists to detect and alert on sources that have stopped producing data (e.g., a monitoring query that fires if a source has not delivered records within 24 hours, or an ops alert on ingestion pipeline errors).
Fail criteria: No dead source detection exists. A scraper or API integration could fail silently for days with no alert triggered.
Skip (N/A) when: The system has only form submissions (real-time, user-driven — silence is expected).
Detail on fail: "No dead source detection found — a failing scraper or API integration would go unnoticed until someone manually investigates".

Remediation: Add a health-check query run on a schedule:

// cron: every hour
async function checkSourceHealth() {
  const staleSources = await db.query(`
    SELECT source_id, MAX(acquired_at) AS last_seen
    FROM contacts
    GROUP BY source_id
    HAVING MAX(acquired_at) < NOW() - INTERVAL '24 hours'
  `)
  if (staleSources.rows.length > 0) {
    await sendAlert('Stale data sources detected', staleSources.rows)
  }
}

External references

iso-25010:2011 · reliability.availability — Reliability / availability
nist-csf:2.0 · DE.CM-1 — Networks and network services are monitored to find potentially adverse events

Taxons

observability

History

2026-04-18·v1.0.0·Initial import from data-sourcing-provenance·automated