PII separated from identifiers; analytics uses pseudonymous IDs
Why it matters
Sending raw email addresses or full names to analytics platforms violates GDPR Art. 25 (data protection by design) and Art. 5(1)(b) (purpose limitation) because analytics processors are not authorized recipients of identifying PII under most privacy policies. OWASP A02 flags unnecessary PII exposure as a cryptographic and privacy failure. CWE-359 specifically covers unintended exposure of private information to third parties — which is exactly what happens when analytics.identify(user.email) fires. If a Segment, Mixpanel, or GA4 account is breached or subpoenaed, the attacker or regulator receives a map from behavioral data to real identities. ISO-27001:2022 A.5.34 requires privacy controls to be embedded in processing, not bolted on afterward.
Severity rationale
Critical because sending PII to third-party analytics processors creates an irrevocable exposure — once transmitted, you cannot undo the data transfer or control downstream retention.
Remediation
Replace direct PII in analytics calls with a salted SHA-256 hash. Add this to lib/analytics.ts and use it everywhere you call identify() or set user_id.
// lib/analytics.ts
import { createHash } from 'crypto'
// One-way, consistent pseudonymous ID per user
export function pseudoId(email: string): string {
return createHash('sha256')
.update(email + process.env.ANALYTICS_SALT!)
.digest('hex')
.slice(0, 16)
}
// Segment / Amplitude / Mixpanel:
analytics.identify(pseudoId(user.email), {
plan: user.plan, // OK — not PII
createdAt: user.createdAt
// Never: email, name, phone, address
})
// GA4:
gtag('config', 'G-XXXXXXXX', { user_id: pseudoId(user.email) })
Audit logs must reference userId (UUID), not email or name. Keep PII-to-ID correlation in a secure internal admin tool only, never in the log store itself.
Detection
-
ID:
pii-separation -
Severity:
critical -
What to look for: Enumerate every relevant item. Examine the database schema and analytics implementation. Does the
userstable store email, phone, and full name alongside theuser_id? That is acceptable — separation means not using PII as the identifier itself (e.g., using email address as the primary key or sending it directly to analytics). Check analytics configuration: in GA4 look forgtag('config', 'G-XXXXXXXX', { user_id: ... })— what value is passed? It should be a UUID or hash, not an email or name. Check Segment, Mixpanel, and Amplitudeidentify()calls for the same pattern. Search foridentify(user.email)orsetUser(user.name)— these are violations. Check audit logs (if present) to confirm log entries use IDs, not names or email addresses. -
Pass criteria: At least 1 of the following conditions is met. Analytics
identify()calls use a pseudonymous ID (a UUID or a SHA-256 hash of the email, never the raw email). Audit logs referenceuser_idnot plaintext email or name. The application does not use email address as a primary key or URL parameter that ends up in server logs. -
Fail criteria: Analytics receives raw email addresses or full names as the user identity. Audit logs contain plaintext PII. Email is used as a primary key and appears in URLs (e.g.,
/user/alice@example.com/profile). -
Do NOT pass when: The item exists only as a placeholder, stub, or TODO comment — partial implementation does not count as passing.
-
Skip (N/A) when: The application has no analytics, no audit logging, and no user identification beyond a session-scoped token.
-
Cross-reference: For broader data handling practices, the Data Protection audit covers data lifecycle management.
-
Detail on fail: Specify the issue. Example:
"Segment identify() called with user.email as userId in src/analytics.ts."or"Audit logs include plaintext email addresses in user_action column."or"GA4 user_id set to user.email in _app.tsx.". -
Remediation: Hash or replace PII before sending to analytics or logs:
// lib/analytics.ts import { createHash } from 'crypto' // Pseudonymous ID: one-way hash of email, consistent per user export function pseudoId(email: string): string { return createHash('sha256').update(email + process.env.ANALYTICS_SALT!).digest('hex').slice(0, 16) } // Usage in Segment/GA4/Amplitude: analytics.identify(pseudoId(user.email), { plan: user.plan, // OK — not PII createdAt: user.createdAt // OK — not PII // Never send: email, name, phone, address }) // For GA4: gtag('config', 'G-XXXXXXXX', { user_id: pseudoId(user.email) })Audit logs should reference
userId(UUID) only. If you need to correlate a log entry back to an email during incident response, that lookup should happen in a secure internal admin tool, not be stored in the log itself.
External references
- cwe · CWE-359 — Exposure of Private Personal Information to an Unauthorized Actor
- owasp:2021 · A02 — Cryptographic Failures
- gdpr · Art. 25 — Data protection by design and by default
- iso-27001:2022 · A.5.34 — Privacy and protection of PII
Taxons
History
- 2026-04-18·v1.0.0·Initial import from data-protection·automated