Expensive operations are cached or marked as fresh-required

ab-000251 · ai-slop-cost-bombs.cache-idempotency.expensive-ops-cached-or-flagged

Severity: mediumactive

Why it matters

LLM API calls are the most expensive per-request operations in a modern web stack — GPT-4o costs $2.50/M input tokens and $10/M output tokens, and a single summarization endpoint with no caching pays that cost on every request, even for identical inputs. OWASP LLM10 (Unbounded Consumption) specifically identifies missing caching as a cost-amplification risk for LLM-backed applications. At modest scale, 1,000 daily requests to an uncached summarization endpoint can cost $50–$200/day in token charges. The fix is additive — unstable_cache wraps any async function — and reduces cost to near-zero for duplicate inputs without changing the API contract.

Severity rationale

Medium because the financial impact scales with traffic rather than being immediately exploitable, but a viral moment on an uncached LLM endpoint can produce four-figure API bills overnight.

Remediation

Wrap the LLM call in unstable_cache (Next.js) or a Redis-backed cache keyed to a hash of the input. For deterministic inputs, a 24-hour TTL is usually appropriate.

import { unstable_cache } from 'next/cache'
import { createHash } from 'crypto'
import OpenAI from 'openai'

const openai = new OpenAI()

const cachedSummarize = unstable_cache(
  async (text: string) => {
    const res = await openai.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: `Summarize: ${text}` }],
    })
    return res.choices[0]?.message.content ?? ''
  },
  ['summarize'],
  { revalidate: 86400 },
)

export async function POST(req: Request) {
  const { text } = await req.json()
  const summary = await cachedSummarize(text)
  return Response.json({ summary })
}

For the AI Token Optimization audit (ai-token-optimization), which covers token usage patterns in depth, see the dedicated audit.

Detection

ID: expensive-ops-cached-or-flagged
Severity: medium
What to look for: Walk source files for expensive operation call sites — calls to LLM SDKs (openai, @anthropic-ai/sdk, @google/generative-ai), expensive third-party APIs (Stripe, Resend, SendGrid, Twilio in non-mutating contexts), or known-expensive computations. Count all expensive operation call sites. For each, verify either the surrounding function (or a function it calls before the expensive op) includes a cache lookup (cache.get(, redis.get(, kv.get(, unstable_cache(, revalidate:, a memoization wrapper) OR the call is inside a route that explicitly opts out (cache: 'no-store', streamText(, useChat( server-side).
Pass criteria: 100% of expensive operations are either cached or explicitly opt out. Report: "X expensive operation call sites, Y cached or fresh-flagged, 0 silently-uncached."
Fail criteria: At least 1 expensive operation call has no cache and no opt-out flag.
Skip (N/A) when: No expensive operation libraries detected in dependencies.
Cross-reference: For deeper LLM token cost analysis, the AI Token Optimization audit (ai-token-optimization) covers token usage patterns and caching strategies for language models in detail.
Detail on fail: "1 uncached LLM call: app/api/summarize/route.ts calls openai.chat.completions.create() with no cache — every request to the same input pays the full token cost"

Remediation: Caching cuts repeat-request costs to zero. Wrap expensive operations in unstable_cache:

// Bad: every request pays full LLM cost
export async function POST(req: Request) {
  const { text } = await req.json()
  const summary = await openai.chat.completions.create({ ... })
  return Response.json({ summary })
}

// Good: cache by content hash
import { unstable_cache } from 'next/cache'
import { createHash } from 'crypto'

const summarize = unstable_cache(
  async (text: string) => {
    return openai.chat.completions.create({ ... })
  },
  ['summarize'],
  { revalidate: 86400 }
)

export async function POST(req: Request) {
  const { text } = await req.json()
  const summary = await summarize(text)
  return Response.json({ summary })
}

External references

iso-25010:2011 · performance-efficiency

Taxons

cost-efficiency

History

2026-04-18·v1.0.0·Initial import from ai-slop-cost-bombs·automated