Per-source quality scoring implemented

ab-000877 · data-sourcing-provenance.source-management.per-source-quality-score

Severity: mediumactive

Why it matters

Per-contact quality flags (bounce, invalid email) are only useful if you can trace them back to the source that produced them. Without per-source quality scoring you have no way to quarantine a bad purchased list, deprioritize a scraper hitting stale pages, or negotiate data credits with a vendor whose records are 40% invalid. ISO 25010:2011 §4.2.7 (data quality) requires that data accuracy be measurable and traceable. Absence of source-level aggregation means data rot goes undetected until deliverability collapses.

Severity rationale

Medium because the gap is an operational blind spot rather than an immediate data breach, but it allows quality degradation to accumulate undetected across entire sources.

Remediation

Add a data_sources table with a quality_score column, then populate it via a periodic aggregation query that computes bounce and invalid-email rates per source.

CREATE TABLE data_sources (
  id            TEXT PRIMARY KEY,
  source_type   TEXT NOT NULL,
  display_name  TEXT,
  quality_score NUMERIC(5,2),   -- 0–100
  last_scored_at TIMESTAMPTZ
);

-- Periodic update (run hourly or daily)
UPDATE data_sources ds
SET quality_score = (
  SELECT 100.0 * COUNT(*) FILTER (WHERE c.email_valid AND NOT c.bounced)
         / NULLIF(COUNT(*), 0)
  FROM contacts c
  WHERE c.source_id = ds.id
    AND c.acquired_at > NOW() - INTERVAL '30 days'
),
last_scored_at = NOW();

Detection

ID: per-source-quality-score
Severity: medium
What to look for: Count all quality metrics tracked at the source level. Look for logic that computes or updates a quality score, acceptance rate, or health indicator at the source level — not just per-contact. This might be a source_quality_score on a sources table, a periodic job that calculates bounce rate or invalid-email rate by source, or a dashboard metric grouped by source_id. The key is whether the system can answer "which source is producing the most bad data?"
Pass criteria: At least 1 mechanism exists to track data quality at the source level: a sources table with a quality metric, a query or aggregation that segments quality by source, or an alerting threshold triggered by per-source bad data rates.
Fail criteria: Quality is only tracked at the individual contact level with no aggregation back to the source. There is no way to identify which source is producing bad contacts without a manual query.
Skip (N/A) when: The system has only a single static data source.
Detail on fail: "No per-source quality tracking found — bounce rates and invalid records are not aggregated by source" or "Quality metrics exist but are global, not segmented by source".

Remediation: Add a sources table with a quality metric and update it periodically:

CREATE TABLE data_sources (
  id           TEXT PRIMARY KEY,
  source_type  TEXT NOT NULL,
  display_name TEXT,
  quality_score NUMERIC(5,2),  -- 0-100
  last_scored_at TIMESTAMPTZ
);

Then run a periodic query to update quality_score based on bounce/invalid rates for contacts from each source.

External references

iso-25010:2011 · functional-suitability.functional-correctness — Data quality / functional suitability

Taxons

data-integrity observability

History

2026-04-18·v1.0.0·Initial import from data-sourcing-provenance·automated