AI APIs are susceptible to transient failures: 429 Too Many Requests during traffic spikes, 503 Service Unavailable during provider incidents, and occasional 500 errors. Without retry logic, every transient error immediately surfaces to the user as a failed request — even though the same call would succeed 2 seconds later. Without exponential backoff, retries that fire immediately in a tight loop amplify rate limit pressure at exactly the wrong time, causing cascading failures across all requests rather than absorbing the spike gracefully. ISO 25010 reliability requires predictable behavior under transient failure conditions.
Medium because missing or improperly implemented retry logic converts transient API errors into user-visible failures and can worsen rate limit pressure under load.
Use the Vercel AI SDK's built-in maxRetries parameter, which applies exponential backoff automatically. For custom SDK wrappers, implement backoff explicitly with a bounded retry count.
// Vercel AI SDK — exponential backoff built in
const result = await streamText({
model: openai("gpt-4o"),
messages,
maxTokens: 1000,
maxRetries: 3, // SDK waits 1s, 2s, 4s between attempts
});
// Custom wrapper for non-SDK paths
async function withBackoff<T>(fn: () => Promise<T>, maxAttempts = 3, baseDelay = 1000): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try { return await fn(); }
catch (err) {
if (attempt === maxAttempts) throw err;
await new Promise(r => setTimeout(r, baseDelay * Math.pow(2, attempt - 1)));
}
}
throw new Error("Unreachable");
}
Verify by inspecting each streamText or generateText call for maxRetries, or by simulating a 429 response and observing delayed retry attempts in server logs.
ID: ai-token-optimization.streaming-performance.retry-with-backoff
Severity: medium
What to look for: Check for retry logic on AI API calls. Look for the maxRetries parameter in Vercel AI SDK calls (which enables built-in exponential backoff). Check for custom retry wrappers that catch 429 Too Many Requests or 5xx errors. Look for tight while loops or for loops that immediately retry without delay. Also check if the application has no retry logic at all, meaning transient API errors immediately surface to the user. Count all instances found and enumerate each.
Pass criteria: Retries are implemented with exponential backoff — each retry waits progressively longer before the next attempt (e.g., 1s, 2s, 4s). The Vercel AI SDK applies exponential backoff automatically when maxRetries is set. The number of retries is bounded (typically 2-4). At least 1 implementation must be confirmed.
Fail criteria: No retry logic exists (transient errors immediately fail the request), OR retry logic retries immediately in a tight loop without delay, which exacerbates rate limit pressure and can cause cascading failures.
Skip (N/A) when: The project uses a middleware or gateway layer (Cloudflare AI Gateway, LangSmith, custom proxy) that handles retries transparently and the application code does not need to implement its own. Signal: An API gateway or proxy is configured between the application and the AI provider, with retry configuration visible in its settings.
Detail on fail: "Retry logic missing or retries immediately — transient failures hit users or worsen rate limits"
Remediation: AI APIs are susceptible to transient errors (429, 500, 503). Without retries, these errors immediately bubble to the user. Without backoff, aggressive retries worsen rate limit problems by hammering the API at exactly the wrong time.
// Vercel AI SDK — built-in exponential backoff via maxRetries
const result = await streamText({
model: openai("gpt-4o"),
messages,
maxTokens: 1000,
maxRetries: 3, // SDK applies exponential backoff automatically: 1s, 2s, 4s
});
// Custom retry with backoff (if not using Vercel AI SDK)
async function withBackoff<T>(
fn: () => Promise<T>,
maxAttempts = 3,
baseDelay = 1000
): Promise<T> {
for (let attempt = 1; attempt <= maxAttempts; attempt++) {
try {
return await fn();
} catch (err) {
if (attempt === maxAttempts) throw err;
const delay = baseDelay * Math.pow(2, attempt - 1);
await new Promise(resolve => setTimeout(resolve, delay));
}
}
throw new Error("Unreachable");
}
Verify by inspecting the Vercel AI SDK call for maxRetries, or by simulating a 500 error and observing delayed retry attempts in the network tab.