All 20 checks with why-it-matters prose, severity, and cross-references to related audits.
CWE-778 (Insufficient Logging) and OWASP A09 (Security Logging & Monitoring Failures) both identify the absence of structured, machine-readable logs as a root cause of failed incident detection. When logging is ad-hoc `console.log` calls, production log platforms cannot parse, index, or alert on log data — a single exception swallowed in unstructured output can leave a breach undetected for days. NIST AU-2 requires that audit-relevant events be logged in a format that supports analysis; plain-text console output fails that bar. Structured JSON logs (with timestamp, level, message, and context fields) are what log aggregation platforms like Better Stack, Axiom, and Datadog index and query against.
Why this severity: High because unstructured logs disable the entire observability pipeline — alerting, search, and incident response all depend on parseable log entries.
saas-logging.app-logging.structured-loggingSee full patternCWE-779 (Logging of Excessive Data) and CWE-778 (Insufficient Logging) sit at opposite ends of the same failure: logging everything at one level either floods your log store with noise or misses the signal entirely. OWASP A09 and NIST AU-2 both require that logged events carry enough severity context to support triage. When `error` level is used for informational messages or `info` is used for caught exceptions, alerting rules misfires — high-severity issues never trigger a page, and on-call engineers chase false alarms. The business impact is delayed incident response: teams lose the ability to distinguish a 500-rate spike from routine debug output.
Why this severity: Medium because miscategorized log levels degrade alert fidelity without directly exposing data, but they predictably delay incident detection and response.
saas-logging.app-logging.log-levels-appropriateSee full patternCWE-532 (Insertion of Sensitive Information into Log File) is a direct path to credential theft: password hashes, session tokens, and API keys logged to stdout end up in your log aggregation platform, which typically has far weaker access controls than your primary database. PCI-DSS Req 10.3.3 explicitly prohibits logging cardholder data in any form; GDPR Art. 32 requires appropriate safeguards for personal data at rest and in transit — your log store is both. A single `logger.info(req.body)` in a payment handler can land card numbers, CVVs, and billing addresses in a third-party SaaS log platform, creating a compliance violation that persists for your entire log retention window. OWASP A09 lists sensitive data in logs as a primary logging failure mode.
Why this severity: Critical because logged secrets, tokens, or PII remain exploitable for the full log retention period and may be exfiltrated from log infrastructure, which is rarely as hardened as your primary data store.
saas-logging.app-logging.no-sensitive-data-logsSee full patternNIST AU-2 requires logging of events sufficient to support incident investigation; PCI-DSS Req 10.2 specifically mandates logging of all access to audit trails and system components, including HTTP method, status, and timing. Without request-level logs capturing method, path, status code, and duration, you cannot answer the most basic production questions: which endpoints are returning 500s, which are slow, and whether a latency spike started before or after a deploy. OWASP A09 identifies the absence of request logging as a direct enabler of undetected attacks — repeated 401s, scan patterns, and error spikes are invisible without a request log stream. Application-level logs that capture only business logic events are insufficient.
Why this severity: High because missing request logs make it impossible to detect error spikes, latency regressions, or attack patterns in production — the entire incident detection foundation is absent.
saas-logging.app-logging.request-loggingSee full patternNIST AU-3 requires that logged records contain sufficient information to establish what happened, when, who was involved, and what was affected. CWE-778 covers the failure mode directly: when errors are caught and logged with only `err.message`, the stack trace is discarded and the context of what triggered the error is lost. In practice this means that when a payment failure or auth error occurs at 3 AM, the engineer on call has a string like `'Unexpected token in JSON'` and nothing else to work from — no file, no line, no user ID, no operation. The missing stack trace typically adds 20–60 minutes to incident resolution. OWASP A09 calls out insufficient error context as a core logging failure.
Why this severity: High because incomplete error logs remove the evidence trail needed to diagnose production incidents, directly extending mean time to resolution.
saas-logging.app-logging.error-logging-stack-contextSee full patternCWE-779 (Logging of Excessive Data) identifies verbose debug output in production as a logging misconfiguration with real costs: inflated log ingestion bills, slower log shipping, and inadvertent exposure of internal state that debug-level entries often contain. NIST AU-11 covers log retention and volume management — producing debug output in production that you don't need directly increases retention costs without increasing observability value. The secondary risk is that debug output frequently captures internal parameter values, function inputs, and intermediate state that are inappropriate for a production log store. A hardcoded `level: 'debug'` in your logger config means your production environment is behaving like a development instance.
Why this severity: Low because debug logging in production inflates cost and occasionally leaks internal state, but does not directly enable attack vectors or data loss.
saas-logging.app-logging.no-debug-productionSee full patternCWE-778 and OWASP A09 both identify the absence of monitoring as a prerequisite for undetected failures. Uptime monitoring is the most basic layer: without an external service probing your application on a 30–60 second interval, you learn your app is down from user complaints — typically 15–60 minutes after the outage began. For a SaaS, every minute of undetected downtime is lost revenue and accumulated churn risk. NIST AU-6 (Audit Review, Analysis, and Reporting) requires continuous monitoring of system activity; uptime monitoring is the minimum viable implementation of this requirement for a public-facing service. The 'never skip' status of this check reflects that no deployed application is exempt — static sites go down too.
Why this severity: Critical because without external uptime monitoring, downtime goes undetected until users report it, maximizing the revenue and reputation impact of every outage.
saas-logging.monitoring.uptime-monitoringSee full patternPCI-DSS Req 10.7 requires that failures of critical security controls are detected and reported promptly; OWASP A09 lists the absence of error rate alerting as a primary logging and monitoring failure. Without an error tracking service, you have no signal when your application starts throwing 500s at scale — users silently churn while a deployment bug or dependency failure goes unnoticed. CWE-778 covers this: insufficient logging of error conditions leaves the system operating in a degraded state without any operator awareness. Error rate alerting ("10 errors in 5 minutes → page on-call") is the automated trip wire that converts application errors into human-actionable incidents before they become outages.
Why this severity: High because unmonitored error rates allow deployment bugs and dependency failures to degrade user experience for extended periods before detection.
saas-logging.monitoring.error-rate-alertingSee full patternNIST AU-2 requires that systems log events sufficient to support analysis of system performance and security posture. Without performance monitoring, latency regressions introduced by a deploy are invisible until they affect enough users to show up in support tickets. P95 response times, transaction durations, and route-level latency data are the signals that distinguish "one slow request" from "our checkout flow is degraded for everyone." This check covers the gap between uptime monitoring (is the app up?) and performance monitoring (is it fast?). A SaaS losing 20% of users at checkout because the payment route is timing out has the same revenue impact as downtime — it just takes longer to surface.
Why this severity: Medium because performance regressions without monitoring degrade user experience and conversion rates, but they rarely cause data loss or security exposure directly.
saas-logging.monitoring.performance-monitoringSee full patternNIST AU-6 requires continuous review of audit records and system performance. For self-hosted deployments, memory leaks, disk-filling log files, and CPU runaway are the most common causes of unexpected outages — and they are invisible without infrastructure-level monitoring. A Node.js process with a memory leak gradually degrades over hours until OOM kills it; without a memory usage alert, the first signal is the application going down. This check applies only to self-hosted deployments because serverless platforms (Vercel, Netlify, Cloudflare Workers) manage infrastructure scaling automatically and do not expose these metrics as meaningful operator concerns.
Why this severity: Medium because unmonitored infrastructure metrics allow memory leaks and disk exhaustion to cause outages on self-hosted deployments, with no early warning.
saas-logging.monitoring.disk-memory-cpuSee full patternNIST AU-2 requires that systems generate audit records for events sufficient to support incident investigation, and a health check endpoint is the single most important hook for external monitoring and orchestration systems. Without a dedicated health endpoint, uptime monitors are forced to check your homepage — which may return 200 even when your database is unreachable or your API is broken. Load balancers and container orchestrators (ECS, Kubernetes, Fly.io) also depend on health endpoints to route traffic and restart failed instances. A missing health check means your monitoring tools and infrastructure cannot reliably determine whether your application is actually functional.
Why this severity: Medium because without a health endpoint, uptime monitors check the wrong signal and load balancers cannot automatically isolate degraded instances.
saas-logging.monitoring.health-check-endpointSee full patternPCI-DSS Req 10.2 mandates logging of all access to payment-related system components, including who performed actions and when. SOC 2 CC7.2 requires detection and logging of security-relevant activities. CWE-778 covers the failure directly: sensitive operations performed without an audit trail leave you unable to answer the most common post-incident questions — "when did this role change happen?", "who approved this data export?", "what did this admin do?" Application logs are insufficient because they are ephemeral, mutable, and not designed for forensic analysis. OWASP A09 identifies missing audit trails as a primary monitoring failure. A breach investigation without an audit log typically takes 3–5x longer and often cannot establish the timeline of events at all.
Why this severity: Critical because sensitive operations without an audit trail cannot be forensically reconstructed after a security incident, and the absence violates PCI-DSS Req 10.2 and SOC 2 CC7.2 directly.
saas-logging.audit-trail.audit-log-sensitive-opsSee full patternPCI-DSS Req 10.3.2 explicitly requires that audit log files be protected to prevent unauthorized modifications. NIST AU-9 requires protection of audit information from unauthorized access, modification, and deletion. An audit log a privileged user or a compromised service account can modify is not a trustworthy audit trail — an attacker who can delete audit events covering their tracks has neutralized your forensic capability entirely. SOC 2 CC7.2 requires integrity controls on security-relevant records. The most common failure mode is an `audit_events` table created with the same full-CRUD RLS policy as application tables, making its records as mutable as any other data in the system.
Why this severity: Medium because a mutable audit log weakens forensic integrity and fails PCI-DSS Req 10.3.2 and NIST AU-9, but exploitation requires privileged access already compromised.
saas-logging.audit-trail.audit-logs-immutableSee full patternNIST AU-11 (Audit Record Retention) requires that audit records be retained for a period consistent with the organization's data classification and applicable regulations. PCI-DSS Req 10.7.1 mandates a minimum one-year audit log retention. GDPR Art. 5(1)(e) requires that personal data not be kept longer than necessary for its purpose — meaning log retention also has a ceiling, not just a floor. Without a defined policy, teams either store logs indefinitely (accruing cost and compliance risk from retaining stale PII) or lose logs before an investigation needs them. A retention decision made during an incident is almost always wrong in one direction or the other.
Why this severity: Low because undefined retention causes cost overruns and compliance gaps, but the immediate security impact is indirect unless an investigation actively needs logs that were deleted.
saas-logging.audit-trail.log-retention-policySee full patternOWASP A09 and NIST AU-7 (Audit Record Reduction and Report Generation) both require that logs be accessible for analysis — which means persisted, indexed, and queryable, not just emitted to stdout. On serverless platforms like Vercel, function logs visible in the dashboard are ephemeral and retained for only a short window (typically 1–24 hours) without a log drain configured. A structured JSON logger producing perfect output is useless if the output disappears before you need it. When an incident happens, you need to query log history across a time range, filter by user ID, or trace a specific error across multiple function invocations — none of which is possible with transient platform logs.
Why this severity: Medium because logs that are not persisted to a searchable store are effectively unavailable for incident investigation, making structured logging moot.
saas-logging.observability.logs-searchableSee full patternNIST AU-2 requires that audit-relevant events — including performance events — be captured at sufficient resolution to support analysis. Aggregate server-level response time hides where latency is actually coming from. If your P95 is 800ms but it originates entirely from one `/api/reports/export` endpoint, aggregate metrics tell you nothing actionable. Per-route latency data is what enables you to write a latency-based SLO ("checkout route P99 < 500ms"), catch regressions in a specific endpoint after a deploy, and prioritize which routes to optimize. This check goes beyond `performance-monitoring` (any timing data) to require route-level attribution — the difference between knowing "something is slow" and knowing "which endpoint is slow."
Why this severity: Medium because without per-route timing, latency regressions in specific endpoints are invisible in aggregate metrics, delaying optimization and SLO definition.
saas-logging.observability.api-response-times-trackedSee full patternNIST AU-2 requires that systems log events necessary to support performance analysis. Database query performance is the most common hidden cause of SaaS slowdowns at scale: N+1 query patterns that execute in milliseconds against a development database with 100 rows execute in seconds against production with 100,000 rows. Without slow query monitoring, these regressions are invisible until P99 latency degrades or the database CPU spikes. Unlike application-level performance monitoring, query monitoring requires either ORM-level hooks or database-level configuration — the application server cannot observe query timing without instrumenting the database client directly.
Why this severity: Low because query performance issues are typically latency-degradation problems rather than security exposures, but undetected slow queries cause cascading outages as connection pools saturate.
saas-logging.observability.db-query-performanceSee full patternNIST AU-6 requires ongoing monitoring of system activity, which includes the health of external dependencies your system relies on. A Stripe outage that takes down your checkout flow is functionally indistinguishable from your own application failing — from the user's perspective, payments are broken. The difference is your response: if you know Stripe is the cause within two minutes, you update your status page and notify users; if you spend 45 minutes debugging your own code first, your response is slower and your customers are angrier. Third-party service health monitoring is free (status page subscriptions) and takes minutes to configure — the gap between "configured" and "not configured" is purely awareness.
Why this severity: Low because third-party outages are outside your control, but monitoring them significantly reduces time-to-correct-diagnosis during incidents involving external dependencies.
saas-logging.observability.third-party-health-monitoredSee full patternNIST AU-3 requires that audit records contain sufficient information to establish the source and outcome of logged events — and when a single user request spans multiple log entries across middleware, API handlers, and database calls, a correlation ID is the thread that ties them together. Without it, debugging a reported user error requires manual timestamp correlation across log entries that may have subsecond overlaps with other concurrent requests. The practical impact: an investigation that takes 5 minutes with correlation IDs takes 30–60 minutes without them. OWASP A09 identifies insufficient log context as a monitoring failure; a request ID is the minimum contextual field that turns a pile of log lines into a traceable sequence.
Why this severity: Low because missing correlation IDs extend debugging time significantly but do not expose data or enable attacks directly.
saas-logging.observability.correlation-ids-in-requestsSee full patternNIST AU-3 requires that audit records be formatted consistently so they can be reliably parsed and analyzed. Inconsistent logging — some files using `userId`, others `user_id`, some emitting JSON, others emitting plain strings — defeats the purpose of structured logging. A query for `userId = 'abc'` in your log platform returns no results for log entries that used `user_id` instead. Inconsistent format also means your log platform's parser may fail to extract structured fields from some entries, causing them to appear as unindexed raw strings. The cost of inconsistency compounds over time: every developer who adds a new log entry makes a slightly different field naming decision, and the drift accelerates.
Why this severity: Low because log format inconsistency degrades query reliability and incident response efficiency, but does not directly expose vulnerabilities or data.
saas-logging.observability.log-format-consistentSee full patternRun this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.
Open Logging & Monitoring Audit