Source-level metrics tracked
Why it matters
Ingestion logs that lack a source_id dimension tell you what happened but not where. When one source starts producing bad records at scale — malformed emails, duplicate floods, elevated bounce rates — you cannot isolate it without manual log correlation. NIST CSF 2.0 DE.AE-3 requires that event data include enough context for anomaly detection. Missing source dimension in logs means the signal exists but is not actionable without hours of forensic work.
Severity rationale
Low because the gap degrades observability rather than correctness, but the operational cost compounds every time a source-level incident requires investigation.
Remediation
Add source_id and source_type as first-class fields on every structured log emitted during ingestion — both success and failure paths. Use a structured logger (pino, winston) so these fields are queryable in your log aggregation tool.
// Emit on every record processed
logger.info('contact_ingested', {
source_id: source.id,
source_type: source.type,
contact_id: contact.id,
status: 'success',
})
logger.warn('contact_ingestion_failed', {
source_id: source.id,
source_type: source.type,
reason: 'validation_error',
error: err.message,
})
If you use a metrics platform (Datadog, PostHog), emit a counter keyed by source_id to enable dashboards and alerting by source.
Detection
-
ID:
source-level-metrics -
Severity:
low -
What to look for: Enumerate all structured log fields emitted during ingestion and count how many include
source_idas a dimension. Look for logging or metrics instrumentation around data ingestion that captures per-source volume and error rates. This might be: structured logs withsource_idincluded in every ingestion event, a metrics counter (Prometheus, Datadog, PostHog) broken down by source, or a simple database table that records per-source ingestion counts and error counts over time. -
Pass criteria: Ingestion events are logged or metered with
source_idas a dimension — at least 1 structured log call includes source_id. Report the count of log calls with source_id even on pass. -
Fail criteria: Ingestion logging exists but does not include the source identifier, making it impossible to segment metrics by source without manual correlation.
-
Skip (N/A) when: The system has a single data source.
-
Detail on fail:
"Ingestion logging found but source_id not included as a structured field — metrics cannot be segmented by source"or"No ingestion metrics or logging found". -
Remediation: Add source dimension to all ingestion log events:
logger.info('contact_ingested', { source_id: source.id, source_type: source.type, contact_id: contact.id, status: 'success', }) logger.warn('contact_ingestion_failed', { source_id: source.id, source_type: source.type, reason: 'validation_error', error: err.message, })
External references
- nist-csf:2.0 · DE.AE-3 — Event data are collected and correlated from multiple sources
- iso-25010:2011 · reliability.availability — Reliability / availability
Taxons
History
- 2026-04-18·v1.0.0·Initial import from data-sourcing-provenance·automated