GDPR Art. 5(1)(c) (data minimization) and Art. 25 (data protection by design) require that only data strictly necessary for the stated purpose is collected. AI-built apps routinely over-collect: a SaaS tool adds a phone number field because the template had one, or an onboarding flow asks for date of birth with no age-gating logic behind it. Every unnecessary field increases breach impact — a database dump containing phone numbers for a tool that never uses them exposes data that should never have existed. CCPA §1798.100(a) also restricts collection to what is disclosed and necessary. ISO-27001:2022 A.5.34 requires minimization as part of privacy-by-design controls.
High because unnecessary data collection directly amplifies breach impact and constitutes a standalone GDPR violation under the minimization principle, regardless of how well the data is secured.
Audit every form and database column by asking: what feature breaks if this field is removed? Remove fields where the answer is "nothing," and defer optional fields until the feature that needs them actually ships.
// BEFORE — over-collection
type SignupForm = {
email: string // needed
password: string // needed
phone: string // no SMS feature — remove
birthdate: string // no age gate — remove
gender: string // no personalization — remove
}
// AFTER — minimal
type SignupForm = {
email: string
password: string
referralSource?: string // optional, aggregated only
}
Run SELECT column_name FROM information_schema.columns WHERE table_name = 'users' and cross-reference each column against the code that reads it. Columns that are populated but never queried are candidates for removal.
ID: data-protection.data-collection-consent.data-minimization
Severity: high
What to look for: Enumerate every relevant item. Review all forms in the application: signup, onboarding, profile, checkout, contact, and survey forms. For each field that collects personal information, trace whether it is actually used in the product's features. Check the database schema for columns that are populated but never queried (orphaned columns). Common over-collection patterns in AI-built apps: phone number field on a SaaS with no SMS feature, date of birth on a tool that has no age gate or age-relevant features, gender field with no personalization logic, "How did you hear about us" plus multiple demographic questions for a dev tool. Check required vs. optional field designations — fields that are required but optional for the service's functionality should be made optional or removed.
Pass criteria: At least 1 of the following conditions is met. All collected data serves a documented business purpose. No fields are collected "just in case." Optional fields are actually optional in the UI. Database schema contains no populated columns that are never read by application code.
Fail criteria: Forms collect fields the product doesn't use. Required fields on forms include data not needed for service delivery. Database contains personal data columns that are always null or never queried.
Skip (N/A) when: N/A conditions do not apply to this check — data minimization is always evaluable.
Detail on fail: Specify the unnecessary data. Example: "Signup form requires phone number, but application has no SMS or phone-based feature. Phone column populated in users table, never queried." or "Onboarding form asks for birthdate and gender; no age-gating or personalization logic found in codebase.".
Remediation: Audit all forms and the database schema together. For each personal data field, ask "what feature breaks if we remove this?" If the answer is "nothing," remove the field. If it's needed for a future feature, make it optional and defer collection until the feature ships:
// BEFORE — collecting unnecessary data
type SignupForm = {
email: string // needed
password: string // needed
phone: string // not needed — remove
birthdate: string // not needed — remove
gender: string // not needed — remove
referralSource: string // optional — make it optional
}
// AFTER — minimal collection
type SignupForm = {
email: string
password: string
referralSource?: string // optional, not stored as PII
}
Run SELECT column_name FROM information_schema.columns WHERE table_name = 'users' to audit what is collected vs. what is used.