Inserting unvalidated contact data — malformed emails, missing required provenance fields, incorrect data types — violates GDPR Art. 5(1)(d), which requires that personal data be accurate and kept up to date. OWASP A03 (Injection) and CWE-20 (Improper Input Validation) both apply when untrusted external data is written directly to the database without a validation gate. A single batch import with no schema check can flood your contacts table with garbage records that are expensive to identify and remove retroactively.
High because unvalidated ingestion paths allow malformed records — invalid emails, null provenance fields, wrong types — to corrupt the contacts table and undermine every downstream process that depends on data quality.
Add a Zod validation schema that every ingestion path must pass before calling any database insert. Failed validation should route to quarantine, not discard.
// src/lib/ingestion/schema.ts
import { z } from 'zod'
import isEmail from 'validator/lib/isEmail'
export const contactIngestSchema = z.object({
email: z.string().refine(isEmail, 'Invalid email format'),
source_type: z.enum(['scraper', 'api', 'purchased', 'form', 'referral']),
source_id: z.string().min(1),
acquired_at: z.string().datetime(),
first_name: z.string().optional(),
last_name: z.string().optional(),
})
export async function ingestContact(raw: unknown) {
const result = contactIngestSchema.safeParse(raw)
if (!result.success) {
await quarantine(raw, result.error.flatten().fieldErrors)
return
}
await db.contacts.create({ data: result.data })
}
This schema enforces all three required provenance fields (source_type, source_id, acquired_at) as mandatory, so a missing provenance field is caught at ingest rather than at compliance audit.
ID: data-sourcing-provenance.ingestion-pipeline.schema-validation-on-ingest
Severity: high
What to look for: Enumerate all ingestion code paths and count how many validate before inserting. Examine code at the point where contacts are inserted into the database. Look for a validation step before the INSERT that checks at least 3 of: email format validity (regex or library), required field presence, data type conformity, and source field validity. Quote the actual validation library or schema found. This validation should happen in application code before the record reaches the database — relying solely on database constraints does not count as pass.
Pass criteria: Count all ingestion paths and report the ratio: "N of N ingestion paths validate before insert." 100% of ingestion code validates incoming records against a schema before inserting. At least 3 validation rules are enforced per record.
Fail criteria: Records are inserted directly without application-level validation. Invalid emails (e.g., user@, not-an-email) would be inserted if the database constraint does not catch them.
Skip (N/A) when: Not applicable — all ingestion pipelines should validate.
Detail on fail: "Ingestion code inserts contacts without validating email format — malformed emails could be stored" or "No schema validation step found before database insert in the ingestion pipeline".
Remediation: Add validation before inserting:
import { z } from 'zod'
import isEmail from 'validator/lib/isEmail'
const contactIngestSchema = z.object({
email: z.string().refine(isEmail, 'Invalid email format'),
source_type: z.enum(['scraper', 'api', 'purchased', 'form', 'referral']),
source_id: z.string().min(1),
acquired_at: z.string().datetime(),
first_name: z.string().optional(),
last_name: z.string().optional(),
})
async function ingestContact(raw: unknown) {
const result = contactIngestSchema.safeParse(raw)
if (!result.success) {
await quarantine(raw, result.error.flatten())
return
}
await db.contacts.create({ data: result.data })
}