Background job failures are logged and retried

ab-002303 · saas-error-handling.graceful-degradation.background-job-retry

Severity: lowactive

Why it matters

Background jobs that fail silently — especially those processing payments, sending billing notifications, or syncing shared data — can fail for hours before anyone notices. CWE-391 (Unchecked Error Condition) applies directly: a Vercel Cron route that throws without a try/catch produces no log and no retry path. CWE-778 (Insufficient Logging) applies when errors are caught but not recorded. ISO 25010 reliability.fault-tolerance requires that background faults be detectable and recoverable. Payment notification jobs and benchmark snapshot jobs that fail silently result in real user impact — missed invoices, stale data — with no operational signal.

Severity rationale

Low for general background jobs, but elevated for cron jobs handling payments or notifications where silent failures cause direct user-visible impact with no retry path.

Remediation

Wrap every Vercel Cron route handler in try/catch with explicit logging — these routes have no automatic retry, so logging is the only failure signal.

// app/api/cron/digest/route.ts
export async function GET() {
  try {
    await sendWeeklyDigest()
    return NextResponse.json({ success: true })
  } catch (error) {
    console.error('[Cron: digest] Failed:', error)
    return NextResponse.json({ error: 'Digest failed' }, { status: 500 })
  }
}

For jobs that require reliable retry (payment processing, email delivery), migrate from Vercel Cron to Inngest or Trigger.dev — both provide automatic retry with exponential backoff and a UI for monitoring failed jobs without manual log inspection.

Detection

ID: background-job-retry
Severity: low
Note: While this check is severity low for general background jobs, cron jobs that process payments, send notifications, or affect shared data (e.g., benchmark snapshots) represent higher risk when they fail silently. The detail field should call out payment/notification cron jobs specifically when they lack retry logic.
What to look for: Search for background job processing: queue consumers, cron job handlers, webhook processors, event handlers (Inngest, QStash, BullMQ, Trigger.dev, Vercel Cron + API routes). For each: (1) Is there a try/catch wrapping the job handler body? (2) Is the error logged with enough context to identify which job failed and why? (3) Does the job framework support automatic retry on failure, and is retry configured (not left at the default "no retry")? (4) For Vercel Cron routes: are failures logged since there is no automatic retry?
Pass criteria: Count all background job handlers, cron routes, and webhook consumers. Pass if background jobs have try/catch error handling with logging AND either: the framework provides automatic retry (Inngest, Trigger.dev, BullMQ all retry by default), or explicit retry logic is configured with at least 1 retry attempt. Report the count: "X of Y background job handlers have error handling with retry."
Fail criteria: Fail if job handlers have no error handling and let exceptions propagate silently. Fail if errors are caught but not logged. Must not pass when Vercel Cron routes have no error handling — these fail silently with no retry path.
Skip (N/A) when: The project has no background jobs, cron jobs, or async job queues. Signal: no job queue dependencies (Inngest, BullMQ, QStash, Trigger.dev, Upstash, etc.) in package.json, no cron configuration in vercel.json, no webhook consumers.
Detail on fail: "Vercel cron route at app/api/cron/digest/route.ts has no try/catch; sendWeeklyDigest() failures are silent with no retry path". Max 500 chars.
Remediation: Background jobs that fail silently are invisible time bombs. A job that sends billing notifications, processes payments, or syncs data may fail for hours before anyone notices.

For Vercel Cron routes (which have no automatic retry), wrap in try/catch with logging:
```
// app/api/cron/digest/route.ts
export async function GET() {
  try {
    await sendWeeklyDigest()
    return NextResponse.json({ success: true })
  } catch (error) {
    console.error('[Cron: digest] Failed:', error)
    // Optionally alert your monitoring service
    return NextResponse.json({ error: 'Digest failed' }, { status: 500 })
  }
}
```
For higher-reliability needs, use Inngest or Trigger.dev which provide automatic retry with exponential backoff and a UI for monitoring failed jobs.

External references

cwe · CWE-391 — Unchecked Error Condition
cwe · CWE-778 — Insufficient Logging
iso-25010:2011 · reliability.fault-tolerance

Taxons

error-resilience observability

History

2026-04-18·v1.0.0·Initial import from saas-error-handling·automated

Why it matters

Remediation

Wrap every Vercel Cron route handler in try/catch with explicit logging — these routes have no automatic retry, so logging is the only failure signal.

// app/api/cron/digest/route.ts
export async function GET() {
  try {
    await sendWeeklyDigest()
    return NextResponse.json({ success: true })
  } catch (error) {
    console.error('[Cron: digest] Failed:', error)
    return NextResponse.json({ error: 'Digest failed' }, { status: 500 })
  }
}

Detection

ID: background-job-retry
Severity: low
Note: While this check is severity low for general background jobs, cron jobs that process payments, send notifications, or affect shared data (e.g., benchmark snapshots) represent higher risk when they fail silently. The detail field should call out payment/notification cron jobs specifically when they lack retry logic.
What to look for: Search for background job processing: queue consumers, cron job handlers, webhook processors, event handlers (Inngest, QStash, BullMQ, Trigger.dev, Vercel Cron + API routes). For each: (1) Is there a try/catch wrapping the job handler body? (2) Is the error logged with enough context to identify which job failed and why? (3) Does the job framework support automatic retry on failure, and is retry configured (not left at the default "no retry")? (4) For Vercel Cron routes: are failures logged since there is no automatic retry?
Pass criteria: Count all background job handlers, cron routes, and webhook consumers. Pass if background jobs have try/catch error handling with logging AND either: the framework provides automatic retry (Inngest, Trigger.dev, BullMQ all retry by default), or explicit retry logic is configured with at least 1 retry attempt. Report the count: "X of Y background job handlers have error handling with retry."
Fail criteria: Fail if job handlers have no error handling and let exceptions propagate silently. Fail if errors are caught but not logged. Must not pass when Vercel Cron routes have no error handling — these fail silently with no retry path.
Skip (N/A) when: The project has no background jobs, cron jobs, or async job queues. Signal: no job queue dependencies (Inngest, BullMQ, QStash, Trigger.dev, Upstash, etc.) in package.json, no cron configuration in vercel.json, no webhook consumers.
Detail on fail: "Vercel cron route at app/api/cron/digest/route.ts has no try/catch; sendWeeklyDigest() failures are silent with no retry path". Max 500 chars.
Remediation: Background jobs that fail silently are invisible time bombs. A job that sends billing notifications, processes payments, or syncs data may fail for hours before anyone notices.

For Vercel Cron routes (which have no automatic retry), wrap in try/catch with logging:
```
// app/api/cron/digest/route.ts
export async function GET() {
  try {
    await sendWeeklyDigest()
    return NextResponse.json({ success: true })
  } catch (error) {
    console.error('[Cron: digest] Failed:', error)
    // Optionally alert your monitoring service
    return NextResponse.json({ error: 'Digest failed' }, { status: 500 })
  }
}
```
For higher-reliability needs, use Inngest or Trigger.dev which provide automatic retry with exponential backoff and a UI for monitoring failed jobs.