Expensive operations are cached or marked as fresh-required
Why it matters
LLM API calls are the most expensive per-request operations in a modern web stack — GPT-4o costs $2.50/M input tokens and $10/M output tokens, and a single summarization endpoint with no caching pays that cost on every request, even for identical inputs. OWASP LLM10 (Unbounded Consumption) specifically identifies missing caching as a cost-amplification risk for LLM-backed applications. At modest scale, 1,000 daily requests to an uncached summarization endpoint can cost $50–$200/day in token charges. The fix is additive — unstable_cache wraps any async function — and reduces cost to near-zero for duplicate inputs without changing the API contract.
Severity rationale
Medium because the financial impact scales with traffic rather than being immediately exploitable, but a viral moment on an uncached LLM endpoint can produce four-figure API bills overnight.
Remediation
Wrap the LLM call in unstable_cache (Next.js) or a Redis-backed cache keyed to a hash of the input. For deterministic inputs, a 24-hour TTL is usually appropriate.
import { unstable_cache } from 'next/cache'
import { createHash } from 'crypto'
import OpenAI from 'openai'
const openai = new OpenAI()
const cachedSummarize = unstable_cache(
async (text: string) => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: `Summarize: ${text}` }],
})
return res.choices[0]?.message.content ?? ''
},
['summarize'],
{ revalidate: 86400 },
)
export async function POST(req: Request) {
const { text } = await req.json()
const summary = await cachedSummarize(text)
return Response.json({ summary })
}
For the AI Token Optimization audit (ai-token-optimization), which covers token usage patterns in depth, see the dedicated audit.
Detection
-
ID:
expensive-ops-cached-or-flagged -
Severity:
medium -
What to look for: Walk source files for expensive operation call sites — calls to LLM SDKs (
openai,@anthropic-ai/sdk,@google/generative-ai), expensive third-party APIs (Stripe, Resend, SendGrid, Twilio in non-mutating contexts), or known-expensive computations. Count all expensive operation call sites. For each, verify either the surrounding function (or a function it calls before the expensive op) includes a cache lookup (cache.get(,redis.get(,kv.get(,unstable_cache(,revalidate:, a memoization wrapper) OR the call is inside a route that explicitly opts out (cache: 'no-store',streamText(,useChat(server-side). -
Pass criteria: 100% of expensive operations are either cached or explicitly opt out. Report: "X expensive operation call sites, Y cached or fresh-flagged, 0 silently-uncached."
-
Fail criteria: At least 1 expensive operation call has no cache and no opt-out flag.
-
Skip (N/A) when: No expensive operation libraries detected in dependencies.
-
Cross-reference: For deeper LLM token cost analysis, the AI Token Optimization audit (
ai-token-optimization) covers token usage patterns and caching strategies for language models in detail. -
Detail on fail:
"1 uncached LLM call: app/api/summarize/route.ts calls openai.chat.completions.create() with no cache — every request to the same input pays the full token cost" -
Remediation: Caching cuts repeat-request costs to zero. Wrap expensive operations in
unstable_cache:// Bad: every request pays full LLM cost export async function POST(req: Request) { const { text } = await req.json() const summary = await openai.chat.completions.create({ ... }) return Response.json({ summary }) } // Good: cache by content hash import { unstable_cache } from 'next/cache' import { createHash } from 'crypto' const summarize = unstable_cache( async (text: string) => { return openai.chat.completions.create({ ... }) }, ['summarize'], { revalidate: 86400 } ) export async function POST(req: Request) { const { text } = await req.json() const summary = await summarize(text) return Response.json({ summary }) }
External references
- iso-25010:2011 · performance-efficiency
Taxons
History
- 2026-04-18·v1.0.0·Initial import from ai-slop-cost-bombs·automated