Source-level metrics tracked

ab-000880 · data-sourcing-provenance.source-management.source-level-metrics

Severity: lowactive

Why it matters

Ingestion logs that lack a source_id dimension tell you what happened but not where. When one source starts producing bad records at scale — malformed emails, duplicate floods, elevated bounce rates — you cannot isolate it without manual log correlation. NIST CSF 2.0 DE.AE-3 requires that event data include enough context for anomaly detection. Missing source dimension in logs means the signal exists but is not actionable without hours of forensic work.

Severity rationale

Low because the gap degrades observability rather than correctness, but the operational cost compounds every time a source-level incident requires investigation.

Remediation

Add source_id and source_type as first-class fields on every structured log emitted during ingestion — both success and failure paths. Use a structured logger (pino, winston) so these fields are queryable in your log aggregation tool.

// Emit on every record processed
logger.info('contact_ingested', {
  source_id:   source.id,
  source_type: source.type,
  contact_id:  contact.id,
  status:      'success',
})

logger.warn('contact_ingestion_failed', {
  source_id:   source.id,
  source_type: source.type,
  reason:      'validation_error',
  error:       err.message,
})

If you use a metrics platform (Datadog, PostHog), emit a counter keyed by source_id to enable dashboards and alerting by source.

Detection

ID: source-level-metrics
Severity: low
What to look for: Enumerate all structured log fields emitted during ingestion and count how many include source_id as a dimension. Look for logging or metrics instrumentation around data ingestion that captures per-source volume and error rates. This might be: structured logs with source_id included in every ingestion event, a metrics counter (Prometheus, Datadog, PostHog) broken down by source, or a simple database table that records per-source ingestion counts and error counts over time.
Pass criteria: Ingestion events are logged or metered with source_id as a dimension — at least 1 structured log call includes source_id. Report the count of log calls with source_id even on pass.
Fail criteria: Ingestion logging exists but does not include the source identifier, making it impossible to segment metrics by source without manual correlation.
Skip (N/A) when: The system has a single data source.
Detail on fail: "Ingestion logging found but source_id not included as a structured field — metrics cannot be segmented by source" or "No ingestion metrics or logging found".

Remediation: Add source dimension to all ingestion log events:

logger.info('contact_ingested', {
  source_id: source.id,
  source_type: source.type,
  contact_id: contact.id,
  status: 'success',
})

logger.warn('contact_ingestion_failed', {
  source_id: source.id,
  source_type: source.type,
  reason: 'validation_error',
  error: err.message,
})

External references

nist-csf:2.0 · DE.AE-3 — Event data are collected and correlated from multiple sources
iso-25010:2011 · reliability.availability — Reliability / availability

Taxons

observability

History

2026-04-18·v1.0.0·Initial import from data-sourcing-provenance·automated

Why it matters

Remediation

// Emit on every record processed
logger.info('contact_ingested', {
  source_id:   source.id,
  source_type: source.type,
  contact_id:  contact.id,
  status:      'success',
})

logger.warn('contact_ingestion_failed', {
  source_id:   source.id,
  source_type: source.type,
  reason:      'validation_error',
  error:       err.message,
})

If you use a metrics platform (Datadog, PostHog), emit a counter keyed by source_id to enable dashboards and alerting by source.

Detection

ID: source-level-metrics
Severity: low
What to look for: Enumerate all structured log fields emitted during ingestion and count how many include source_id as a dimension. Look for logging or metrics instrumentation around data ingestion that captures per-source volume and error rates. This might be: structured logs with source_id included in every ingestion event, a metrics counter (Prometheus, Datadog, PostHog) broken down by source, or a simple database table that records per-source ingestion counts and error counts over time.
Pass criteria: Ingestion events are logged or metered with source_id as a dimension — at least 1 structured log call includes source_id. Report the count of log calls with source_id even on pass.
Fail criteria: Ingestion logging exists but does not include the source identifier, making it impossible to segment metrics by source without manual correlation.
Skip (N/A) when: The system has a single data source.
Detail on fail: "Ingestion logging found but source_id not included as a structured field — metrics cannot be segmented by source" or "No ingestion metrics or logging found".

Remediation: Add source dimension to all ingestion log events:

logger.info('contact_ingested', {
  source_id: source.id,
  source_type: source.type,
  contact_id: contact.id,
  status: 'success',
})

logger.warn('contact_ingestion_failed', {
  source_id: source.id,
  source_type: source.type,
  reason: 'validation_error',
  error: err.message,
})