Structured logging, metrics, tracing, alerting, and incident-response coverage — the visibility layer for operators and responders.
The visibility layer: can an operator see what happened?
In scope. Structured logging hygiene, metrics emission, distributed tracing with correlation IDs, alert rules and thresholds, monitoring coverage, incident-response runbooks, log-level discipline, security-event logging (auth failures, privilege changes) that serves forensics, log-retention for operations (as distinct from regulatory obligations).
Not in scope. Acting on failure — that's error-resilience. Regulatory audit-trail obligations — those carry regulatory-conformance as a primary taxon with this as secondary. Log-aggregation infrastructure setup (where the pipeline receiving logs is the defect locus) — that's operational-readiness.
Distinct because. Observability is seeing the failure; error-resilience is surviving the failure. A try/catch that logs and degrades carries both. A pattern about "auth failures not logged" is observability. A pattern about "no circuit breaker around external dep" is error-resilience.
Conceptual sub-structure. Logging, metrics, tracing, alerting, audit-trail-for-ops.