Scraper configurations stored only in the database — targets, rate limits, user-agent strings — can be changed by anyone with database access, bypassing code review and leaving no audit trail. SLSA Build L2 and SSDF PW.4 both require that build and configuration artifacts go through version control. A silent config change that removes rate-limit constraints or adds a scraping-prohibited domain is a legal and operational risk that version control would have caught.
Low because the immediate data integrity risk is indirect — the gap is a governance control weakness rather than an exploitable vulnerability today.
Move base source configuration to a code file in src/config/ so that changes require a pull request and deploy. Dynamic overrides (pause flags, temporary rate-limit adjustments) can remain in the database as long as the canonical configuration lives in code.
// src/config/sources.ts
export const SCRAPER_SOURCES = [
{
id: 'linkedin-search',
baseUrl: 'https://linkedin.com/search',
rateLimitMs: 2000,
robotsTxtEnforced: true,
legalBasis: 'legitimate_interest',
},
] as const
Load this array at startup and merge with any database overrides; the code definition always wins for security-sensitive fields like robotsTxtEnforced.
ID: data-sourcing-provenance.source-management.source-config-versioned
Severity: low
What to look for: Count all source configuration definitions and classify each as "in code" or "in database." Look for scraper configurations, API endpoint definitions, and rate limit settings. Are they stored in code (TypeScript/JSON config files, environment-parameterized constants) or in the database as mutable rows? Configuration in code means it goes through version control and code review. Look for config files named sources.config.ts, scrapers/config.ts, or equivalent — not a sources admin table where staff can freely edit scraping targets.
Pass criteria: At least 1 configuration file exists in code. Scraper targets, API client configurations, and rate limit settings are defined in code (tracked in version control) rather than stored as mutable database rows that can be changed without review.
Fail criteria: Source configuration (scraping targets, API endpoints, rate limits) is stored only in the database with no corresponding code representation, making changes invisible to version control and code review.
Skip (N/A) when: The system uses only form submissions and referrals (no scraping or API-based sourcing).
Detail on fail: "Scraping targets stored as database rows with no code representation — config changes bypass version control" or "No source configuration files found in the codebase".
Remediation: Move source configurations to code:
// src/config/sources.ts
export const SCRAPER_SOURCES = [
{
id: 'linkedin-search',
baseUrl: 'https://linkedin.com/search',
rateLimitMs: 2000, // ms between requests
robotsTxtEnforced: true,
},
] as const
Dynamic overrides (e.g., pausing a source) can still live in the database, but the base configuration should be code-defined.