Raw email event data — individual timestamped rows for each open, click, bounce, and unsubscribe — is the foundation for attribution debugging, engagement scoring, send-time optimization, and retrospective analysis when you change attribution models. Cron jobs that prune events older than 30 days in favor of aggregate counts permanently destroy that foundation. You cannot reconstruct which contacts engaged during a specific window, rerun historical attribution with a different model, or answer 'why did this campaign's conversion rate differ from last quarter?' GDPR Art. 5(1)(e) (storage limitation) requires a documented retention policy — the absence of one is itself a compliance gap, not just an analytics deficiency.
Low because short retention degrades analytical capability incrementally rather than breaking current reporting, but the data loss is irreversible — you cannot recover deleted events when you later need them for retrospective analysis.
Retain raw events for at least 12 months and ensure the table is indexed for performant range queries. If storage is a concern, archive to a cold table rather than deleting:
-- Indexes required for campaign + time-range queries at scale
CREATE INDEX IF NOT EXISTS idx_email_events_campaign_type_occurred
ON email_events (campaign_id, event_type, occurred_at);
CREATE INDEX IF NOT EXISTS idx_email_events_contact_occurred
ON email_events (contact_id, occurred_at);
-- Archive pattern: move, don't delete
INSERT INTO email_events_archive
SELECT * FROM email_events
WHERE occurred_at < NOW() - INTERVAL '12 months';
DELETE FROM email_events
WHERE occurred_at < NOW() - INTERVAL '12 months';
Document your retention period in docs/data-retention.md and in your privacy policy. The Data Quality & List Hygiene Audit covers how long-term event history feeds engagement scoring for contact records.
ID: campaign-analytics-attribution.reporting-pipeline.raw-event-retention
Severity: low
What to look for: Check whether raw email events (individual rows per open, click, bounce, unsubscribe) are retained in the database or if only aggregate summaries are kept. Check for data deletion policies or retention-limiting cron jobs that delete raw events after a short window (e.g., deleting events older than 30 days). Raw event data is necessary for ad-hoc analysis, debugging attribution discrepancies, and building time-of-day send optimization. Also check whether the schema is designed for event data scale (indexes on occurred_at, campaign_id, and contact_id).
Pass criteria: Raw email events are retained for at least 12 months. No deletion job prunes raw events on a short schedule. At least 2 indexes exist on the event table for querying by campaign and time range. Count the indexes on the event table and report the total even on pass.
Fail criteria: Raw events are deleted shortly after aggregation (fewer than 90 days retention). Only aggregate counts are retained long-term. No retention policy documented.
Skip (N/A) when: The project uses a third-party ESP and stores no raw event data locally.
Cross-reference: The Data Quality & List Hygiene Audit covers long-term data quality for contact records that depend on event history for engagement scoring.
Detail on fail: Example: "Cron job deletes email_events rows older than 30 days — cannot re-run historical attribution analysis" or "No indexes on email_events.campaign_id or occurred_at — queries on large event tables will be slow"
Remediation: Implement a reasonable retention policy and indexes:
-- Ensure appropriate indexes for query performance
CREATE INDEX IF NOT EXISTS idx_email_events_campaign_type_occurred
ON email_events (campaign_id, event_type, occurred_at);
CREATE INDEX IF NOT EXISTS idx_email_events_contact_occurred
ON email_events (contact_id, occurred_at);
-- If you must archive, move to a cold storage table rather than delete
-- Keep last 12 months in the hot table, archive older rows:
INSERT INTO email_events_archive SELECT * FROM email_events
WHERE occurred_at < NOW() - INTERVAL '12 months';
DELETE FROM email_events WHERE occurred_at < NOW() - INTERVAL '12 months';
For a deeper analysis of bounce handling and list hygiene that depends on raw bounce event data, the Deliverability Engineering Audit covers this in detail.