A scraper or API integration that stops producing records with no alert will silently starve your pipeline for days. You will not know until someone notices list growth has plateaued or a manual check reveals the last ingest timestamp is a week old. NIST CSF 2.0 DE.CM-1 requires continuous monitoring of system components, which extends to data ingestion pipelines. Without dead-source detection, outages in data acquisition are invisible until they become business-impacting.
Low because silent failures affect future data acquisition rather than compromising existing records, but they can cause significant operational and business impact over time.
Add a scheduled health-check job that queries the maximum acquired_at per source and alerts when any source exceeds its expected silence threshold.
// cron: every hour via cron job or Inngest function
async function checkSourceHealth() {
const stale = await db.query(`
SELECT source_id, MAX(acquired_at) AS last_seen
FROM contacts
GROUP BY source_id
HAVING MAX(acquired_at) < NOW() - INTERVAL '24 hours'
`)
if (stale.rows.length > 0) {
await sendAlert('Dead data sources detected', stale.rows)
}
}
Set the silence threshold per source type — scraping sources may warrant a 6-hour threshold, purchased-list imports a 7-day threshold.
ID: data-sourcing-provenance.source-management.dead-source-alerting
Severity: low
What to look for: Count all monitoring or health-check mechanisms that detect when a data source has gone silent — i.e., it normally delivers contacts but has produced zero records for an unexpectedly long period. This might be: a cron job that checks last-ingestion timestamp per source, an alerting rule based on ingestion volume metrics, or a health endpoint that surfaces per-source staleness.
Pass criteria: At least 1 mechanism exists to detect and alert on sources that have stopped producing data (e.g., a monitoring query that fires if a source has not delivered records within 24 hours, or an ops alert on ingestion pipeline errors).
Fail criteria: No dead source detection exists. A scraper or API integration could fail silently for days with no alert triggered.
Skip (N/A) when: The system has only form submissions (real-time, user-driven — silence is expected).
Detail on fail: "No dead source detection found — a failing scraper or API integration would go unnoticed until someone manually investigates".
Remediation: Add a health-check query run on a schedule:
// cron: every hour
async function checkSourceHealth() {
const staleSources = await db.query(`
SELECT source_id, MAX(acquired_at) AS last_seen
FROM contacts
GROUP BY source_id
HAVING MAX(acquired_at) < NOW() - INTERVAL '24 hours'
`)
if (staleSources.rows.length > 0) {
await sendAlert('Stale data sources detected', staleSources.rows)
}
}