LLM API calls are the most expensive per-request operations in a modern web stack — GPT-4o costs $2.50/M input tokens and $10/M output tokens, and a single summarization endpoint with no caching pays that cost on every request, even for identical inputs. OWASP LLM10 (Unbounded Consumption) specifically identifies missing caching as a cost-amplification risk for LLM-backed applications. At modest scale, 1,000 daily requests to an uncached summarization endpoint can cost $50–$200/day in token charges. The fix is additive — unstable_cache wraps any async function — and reduces cost to near-zero for duplicate inputs without changing the API contract.
Medium because the financial impact scales with traffic rather than being immediately exploitable, but a viral moment on an uncached LLM endpoint can produce four-figure API bills overnight.
Wrap the LLM call in unstable_cache (Next.js) or a Redis-backed cache keyed to a hash of the input. For deterministic inputs, a 24-hour TTL is usually appropriate.
import { unstable_cache } from 'next/cache'
import { createHash } from 'crypto'
import OpenAI from 'openai'
const openai = new OpenAI()
const cachedSummarize = unstable_cache(
async (text: string) => {
const res = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: `Summarize: ${text}` }],
})
return res.choices[0]?.message.content ?? ''
},
['summarize'],
{ revalidate: 86400 },
)
export async function POST(req: Request) {
const { text } = await req.json()
const summary = await cachedSummarize(text)
return Response.json({ summary })
}
For the AI Token Optimization audit (ai-token-optimization), which covers token usage patterns in depth, see the dedicated audit.
ID: ai-slop-cost-bombs.cache-idempotency.expensive-ops-cached-or-flagged
Severity: medium
What to look for: Walk source files for expensive operation call sites — calls to LLM SDKs (openai, @anthropic-ai/sdk, @google/generative-ai), expensive third-party APIs (Stripe, Resend, SendGrid, Twilio in non-mutating contexts), or known-expensive computations. Count all expensive operation call sites. For each, verify either the surrounding function (or a function it calls before the expensive op) includes a cache lookup (cache.get(, redis.get(, kv.get(, unstable_cache(, revalidate:, a memoization wrapper) OR the call is inside a route that explicitly opts out (cache: 'no-store', streamText(, useChat( server-side).
Pass criteria: 100% of expensive operations are either cached or explicitly opt out. Report: "X expensive operation call sites, Y cached or fresh-flagged, 0 silently-uncached."
Fail criteria: At least 1 expensive operation call has no cache and no opt-out flag.
Skip (N/A) when: No expensive operation libraries detected in dependencies.
Cross-reference: For deeper LLM token cost analysis, the AI Token Optimization audit (ai-token-optimization) covers token usage patterns and caching strategies for language models in detail.
Detail on fail: "1 uncached LLM call: app/api/summarize/route.ts calls openai.chat.completions.create() with no cache — every request to the same input pays the full token cost"
Remediation: Caching cuts repeat-request costs to zero. Wrap expensive operations in unstable_cache:
// Bad: every request pays full LLM cost
export async function POST(req: Request) {
const { text } = await req.json()
const summary = await openai.chat.completions.create({ ... })
return Response.json({ summary })
}
// Good: cache by content hash
import { unstable_cache } from 'next/cache'
import { createHash } from 'crypto'
const summarize = unstable_cache(
async (text: string) => {
return openai.chat.completions.create({ ... })
},
['summarize'],
{ revalidate: 86400 }
)
export async function POST(req: Request) {
const { text } = await req.json()
const summary = await summarize(text)
return Response.json({ summary })
}