Disaster recovery procedures documented, tested quarterly, with RTO and RPO targets

ab-001655 · infrastructure-hardening.monitoring-incident-response.disaster-recovery

Severity: infoactive

Why it matters

An undocumented disaster recovery process is an untested one — and an untested recovery process will fail at the worst possible moment. Without explicit RTO and RPO targets, there is no definition of success during a recovery, no basis for SLA commitments, and no mechanism to detect that backups are unrestorable before they are needed (NIST 800-53 CP-2, CP-4, ISO 27001 A.5.30). CIS AWS 2.1.1 requires documented recovery procedures. The most common failure mode is not missing backups but unencrypted or untested backups that cannot be restored under real incident conditions.

Severity rationale

Info because the absence of DR procedures does not expose data directly, but it guarantees extended outages and potential permanent data loss when an incident does occur.

Remediation

Create docs/disaster-recovery.md documenting numeric RTO/RPO targets, backup procedures, and recovery steps. Set up automated backups with Velero for Kubernetes resources:

# Install Velero with S3 backend
velero install \
  --provider aws \
  --bucket my-dr-backups \
  --backup-location-config region=us-east-1 \
  --use-volume-snapshots=true

# Schedule daily backups retained for 30 days
velero schedule create daily-cluster-backup \
  --schedule="0 2 * * *" \
  --ttl 720h

Document in docs/disaster-recovery.md:

## Recovery Objectives
- **RTO:** 4 hours (time to restore service)
- **RPO:** 1 hour (maximum data loss window)

## Quarterly DR Test
Date of last test: <date>
Restoration verified by: <name>

Backups must be encrypted (Velero uses KMS by default with AWS) and stored in a separate account or region from the primary environment.

Detection

ID: disaster-recovery
Severity: info
What to look for: List all disaster recovery documentation files (docs/disaster-recovery.md, runbooks, backup policies). Check for backup configuration (Velero schedules, cloud provider snapshots, database backup cron jobs). Count the required DR elements: documented procedures, encrypted backups, RTO target, RPO target, and evidence of quarterly testing. At least 4 of these 5 elements should be present.
Pass criteria: Disaster recovery procedures are documented in a dedicated file. Backups are encrypted and automated. Both RTO (Recovery Time Objective) and RPO (Recovery Point Objective) targets are explicitly defined with numeric values (e.g., RTO: 4 hours, RPO: 1 hour). Evidence of quarterly DR testing exists. At least 4 of 5 required elements are present. Report: "X of 5 DR elements documented."
Fail criteria: No disaster recovery documentation found, backups are not encrypted, or fewer than 4 of 5 required elements (documentation, encrypted backups, RTO target, RPO target, quarterly testing) are present.
Skip (N/A) when: Application is stateless with no persistent data requiring recovery.
Cross-reference: The centralized-logging check ensures logs are preserved for post-incident analysis, which is a prerequisite for effective disaster recovery.
Detail on fail: Quote the documentation gap. Example: "No disaster recovery documentation found in docs/ directory. Backup procedures and recovery steps are not recorded." or "Velero backups configured but no RTO/RPO targets defined and no evidence of quarterly testing — 2 of 5 elements"

Remediation: Document and implement disaster recovery. Example with Velero:

# Install Velero
velero install --provider aws --bucket my-backups

# Create schedule
velero schedule create daily-backup --schedule="0 2 * * *"

# Test recovery
velero restore create --from-backup daily-backup-<date>

Document in docs/disaster-recovery.md:

# Disaster Recovery Plan

**RTO:** 4 hours
**RPO:** 1 hour (hourly backups)

## Backup Procedure
- Velero schedules daily backups at 2 AM UTC
- Backups encrypted with AWS KMS
- Retained for 30 days

## Recovery Procedure
1. Restore from Velero backup
2. Verify application health
3. Run smoke tests
4. Update DNS if needed

## Testing
- Quarterly disaster recovery drill (scheduled for 1st Friday of each quarter)

External references

nist:rev5 · CP-2 — Contingency Plan
nist:rev5 · CP-4 — Contingency Plan Testing
iso-27001:2022 · A.5.30 — ICT Readiness for Business Continuity
external · CIS-AWS-2.1.1 — CIS AWS Foundations Benchmark §2.1.1 — Ensure S3 Bucket Policy is set to deny HTTP requests

Taxons

operational-readiness

History

2026-04-18·v1.0.0·Initial import from infrastructure-hardening·automated

Why it matters

Remediation

Create docs/disaster-recovery.md documenting numeric RTO/RPO targets, backup procedures, and recovery steps. Set up automated backups with Velero for Kubernetes resources:

# Install Velero with S3 backend
velero install \
  --provider aws \
  --bucket my-dr-backups \
  --backup-location-config region=us-east-1 \
  --use-volume-snapshots=true

# Schedule daily backups retained for 30 days
velero schedule create daily-cluster-backup \
  --schedule="0 2 * * *" \
  --ttl 720h

Document in docs/disaster-recovery.md:

## Recovery Objectives
- **RTO:** 4 hours (time to restore service)
- **RPO:** 1 hour (maximum data loss window)

## Quarterly DR Test
Date of last test: <date>
Restoration verified by: <name>

Backups must be encrypted (Velero uses KMS by default with AWS) and stored in a separate account or region from the primary environment.

Detection

ID: disaster-recovery
Severity: info
What to look for: List all disaster recovery documentation files (docs/disaster-recovery.md, runbooks, backup policies). Check for backup configuration (Velero schedules, cloud provider snapshots, database backup cron jobs). Count the required DR elements: documented procedures, encrypted backups, RTO target, RPO target, and evidence of quarterly testing. At least 4 of these 5 elements should be present.
Pass criteria: Disaster recovery procedures are documented in a dedicated file. Backups are encrypted and automated. Both RTO (Recovery Time Objective) and RPO (Recovery Point Objective) targets are explicitly defined with numeric values (e.g., RTO: 4 hours, RPO: 1 hour). Evidence of quarterly DR testing exists. At least 4 of 5 required elements are present. Report: "X of 5 DR elements documented."
Fail criteria: No disaster recovery documentation found, backups are not encrypted, or fewer than 4 of 5 required elements (documentation, encrypted backups, RTO target, RPO target, quarterly testing) are present.
Skip (N/A) when: Application is stateless with no persistent data requiring recovery.
Cross-reference: The centralized-logging check ensures logs are preserved for post-incident analysis, which is a prerequisite for effective disaster recovery.
Detail on fail: Quote the documentation gap. Example: "No disaster recovery documentation found in docs/ directory. Backup procedures and recovery steps are not recorded." or "Velero backups configured but no RTO/RPO targets defined and no evidence of quarterly testing — 2 of 5 elements"

Remediation: Document and implement disaster recovery. Example with Velero:

# Install Velero
velero install --provider aws --bucket my-backups

# Create schedule
velero schedule create daily-backup --schedule="0 2 * * *"

# Test recovery
velero restore create --from-backup daily-backup-<date>

Document in docs/disaster-recovery.md:

# Disaster Recovery Plan

**RTO:** 4 hours
**RPO:** 1 hour (hourly backups)

## Backup Procedure
- Velero schedules daily backups at 2 AM UTC
- Backups encrypted with AWS KMS
- Retained for 30 days

## Recovery Procedure
1. Restore from Velero backup
2. Verify application health
3. Run smoke tests
4. Update DNS if needed

## Testing
- Quarterly disaster recovery drill (scheduled for 1st Friday of each quarter)