Error handling paths, retry and backoff logic, circuit breakers, graceful degradation, fallback behavior, and crash-recovery.
The survival layer: does the system keep working when things go wrong?
In scope. Error-handling paths, retry strategies with backoff and max-attempts, circuit breakers around flaky dependencies, graceful degradation, fallback behavior, timeout handling, partial-failure handling, UI error boundaries, crash recovery, idempotent handling of replays, error-swallowing detection.
Not in scope. Seeing that an error happened (that's observability). Cost effects of retry storms (carry both — resilience for the recovery logic, cost-efficiency for the unbounded retry). Structural stubs or scaffolding reaching production (that's placeholder-hygiene).
Distinct because. The defect is the system breaks down under failure conditions, independent of whether operators can see it. A pattern about "unhandled promise rejection crashes worker" is error-resilience. A pattern about "error happens silently, not logged" is observability.
Conceptual sub-structure. Error-handling paths, retry/backoff, circuit-breaking, graceful degradation, fallback behavior.