Deletion does not corrupt aggregate analytics

ab-000789 · compliance-consent-engine.data-subject-rights.analytics-integrity

Severity: lowactive

Why it matters

GDPR Article 17(3) explicitly permits retention of data necessary for the purposes of archiving in the public interest, scientific or historical research, or statistical purposes — precisely because deleting event rows that underpin aggregate metrics corrupts the historical record without serving the data subject's interests. CWE-459 (Incomplete Cleanup) describes the other side: cleanup that removes more than necessary. Hard-deleting email event rows during erasure can silently corrupt dashboard metrics (open rates, click rates, send volume charts) by removing the denominator rows their calculations depend on, while the PII attached to those rows is what actually needed to go.

Severity rationale

Low because aggregate corruption from over-deletion is a product reliability issue that becomes a compliance complication only when the corrupted metrics are relied on in regulatory or financial reporting.

Remediation

Anonymize event records instead of deleting them. Null out the PII fields while leaving the row intact so aggregate counts remain accurate:

// src/lib/compliance/deletion.ts — anonymize, do not delete, event rows
await db.emailEvent.updateMany({
  where: { contactId },
  data: {
    email: null,
    ipAddress: null,
    userAgent: null,
    // contactId: null  — only if FK allows null; otherwise use a sentinel 'erased' UUID
  }
})

After anonymization, a query like SELECT COUNT(*) FROM email_events WHERE occurred_at BETWEEN ... still returns the correct count, but the row carries no PII. Document this choice in src/lib/compliance/deletion.ts with a comment explaining why anonymization is used instead of deletion — future engineers must not refactor it to deleteMany.

Detection

ID: analytics-integrity
Severity: low
What to look for: Check whether the erasure path anonymizes records before deleting them, preserving aggregate counts that the application depends on. If the application has dashboard metrics ("12,543 emails sent last month", "open rate: 28%"), those metrics may break if the underlying event rows are hard-deleted. Look for a pre-delete anonymization step where email, ip_address, and other PII fields are cleared/replaced before deletion, or a separate aggregate snapshot table that is not affected by contact-level erasure.
Pass criteria: Erasure anonymizes PII fields on event records (replacing email, IP, name with null or a placeholder) rather than hard-deleting rows. Aggregate queries still return correct counts after anonymization. Or aggregate metrics are pre-computed in snapshot tables that are PII-free. Count all PII fields that are anonymized vs. hard-deleted and enumerate each — at least 2 fields must be anonymized (email, IP address).
Fail criteria: Erasure hard-deletes all event rows for a contact, causing aggregate dashboards to silently undercount historical activity. Using DELETE instead of anonymization on tables with aggregate dependencies does not count as pass.
Skip (N/A) when: The application has no analytics or reporting that depends on historical event counts.
Detail on fail: "processErasureRequest() calls db.emailEvent.deleteMany({ where: { contactId } }) — 180k events hard-deleted, causing historical open rate dashboard to drop from 28% to 24%"

Remediation: Anonymize instead of delete:

// Anonymize rather than hard-delete event records
await db.emailEvent.updateMany({
  where: { contactId },
  data: {
    email: null,
    ipAddress: null,
    userAgent: null,
    contactId: null,   // if FK allows null; otherwise use a sentinel 'erased' contact ID
  }
})
// The row (and its count contribution to aggregates) survives — only PII is removed

External references

gdpr · Art. 17(3) — Right to erasure — exceptions for statistical/public interest purposes
cwe · CWE-459 — Incomplete cleanup — hard-delete corrupts aggregates instead of anonymising
iso-25010:2011 · functional-suitability.functional-correctness — Functional correctness — aggregate metrics remain accurate after erasure

Taxons

data-integrity privacy-consent

History

2026-04-18·v1.0.0·Initial import from compliance-consent-engine·automated

Why it matters

Remediation

Anonymize event records instead of deleting them. Null out the PII fields while leaving the row intact so aggregate counts remain accurate:

// src/lib/compliance/deletion.ts — anonymize, do not delete, event rows
await db.emailEvent.updateMany({
  where: { contactId },
  data: {
    email: null,
    ipAddress: null,
    userAgent: null,
    // contactId: null  — only if FK allows null; otherwise use a sentinel 'erased' UUID
  }
})

Detection

ID: analytics-integrity
Severity: low
What to look for: Check whether the erasure path anonymizes records before deleting them, preserving aggregate counts that the application depends on. If the application has dashboard metrics ("12,543 emails sent last month", "open rate: 28%"), those metrics may break if the underlying event rows are hard-deleted. Look for a pre-delete anonymization step where email, ip_address, and other PII fields are cleared/replaced before deletion, or a separate aggregate snapshot table that is not affected by contact-level erasure.
Pass criteria: Erasure anonymizes PII fields on event records (replacing email, IP, name with null or a placeholder) rather than hard-deleting rows. Aggregate queries still return correct counts after anonymization. Or aggregate metrics are pre-computed in snapshot tables that are PII-free. Count all PII fields that are anonymized vs. hard-deleted and enumerate each — at least 2 fields must be anonymized (email, IP address).
Fail criteria: Erasure hard-deletes all event rows for a contact, causing aggregate dashboards to silently undercount historical activity. Using DELETE instead of anonymization on tables with aggregate dependencies does not count as pass.
Skip (N/A) when: The application has no analytics or reporting that depends on historical event counts.
Detail on fail: "processErasureRequest() calls db.emailEvent.deleteMany({ where: { contactId } }) — 180k events hard-deleted, causing historical open rate dashboard to drop from 28% to 24%"

Remediation: Anonymize instead of delete:

// Anonymize rather than hard-delete event records
await db.emailEvent.updateMany({
  where: { contactId },
  data: {
    email: null,
    ipAddress: null,
    userAgent: null,
    contactId: null,   // if FK allows null; otherwise use a sentinel 'erased' contact ID
  }
})
// The row (and its count contribution to aggregates) survives — only PII is removed

External references

gdpr · Art. 17(3) — Right to erasure — exceptions for statistical/public interest purposes

cwe · CWE-459 — Incomplete cleanup — hard-delete corrupts aggregates instead of anonymising

iso-25010:2011 · functional-suitability.functional-correctness — Functional correctness — aggregate metrics remain accurate after erasure