Smaller models are used for simple tasks (model routing by complexity)

ab-000316 · ai-token-optimization.token-efficiency.model-selection-by-complexity

Severity: highactive

Why it matters

Frontier models like GPT-4o cost 10–50x more per token than their lightweight counterparts — and are often no better at simple tasks. Generating a document title, classifying sentiment, or summarizing a short paragraph does not require frontier-level reasoning. Every request where gpt-4o runs a classification that gpt-4o-mini handles equally well is a 17x cost premium with zero quality benefit. NIST AI RMF MAP 5.1 requires that model selection be matched to operational requirements; using the most powerful model for every task uniformly fails this requirement.

Severity rationale

High because using frontier models for all tasks including simple ones imposes a 10–50x cost multiplier on routine, high-volume operations that cheaper models handle equally well.

Remediation

Define a TASK_MODELS map that explicitly assigns each task type to an appropriate model tier. Import from it at every call site instead of hardcoding a single model string.

// src/lib/ai/models.ts
export const TASK_MODELS = {
  titleGeneration: "gpt-4o-mini",
  classification: "gpt-4o-mini",
  summarization: "gpt-4o-mini",
  sentimentAnalysis: "gpt-4o-mini",
  codeGeneration: "gpt-4o",
  complexReasoning: "gpt-4o",
  multiStepPlanning: "gpt-4o",
} as const;

// src/app/api/title/route.ts
import { TASK_MODELS } from "@/lib/ai/models";

const { text } = await generateText({
  model: openai(TASK_MODELS.titleGeneration),
  prompt: `Generate a concise title for: ${content}`,
  maxTokens: 50,
});

Verify by reviewing each API route's model parameter and confirming that simple classification, labeling, or summarization tasks are not using frontier models.

Detection

ID: model-selection-by-complexity
Severity: high
What to look for: Check every AI API call site for the model parameter. Determine whether the same top-tier model (e.g., gpt-4o, claude-3-5-sonnet, gemini-1.5-pro) is used for every task, or whether the codebase selects models based on task complexity. Look for route-specific model selection, a model routing function, environment variable overrides, or comments distinguishing lightweight vs. heavyweight tasks. Count every AI API call and enumerate which model each uses. Classify whether the model choice is appropriate for the task complexity.
Pass criteria: The codebase uses lightweight models (e.g., gpt-4o-mini, claude-3-haiku, gemini-1.5-flash) for simple, high-volume tasks such as title generation, classification, sentiment analysis, or summarization — and reserves frontier models for complex reasoning tasks. At least 1 implementation must be confirmed.
Fail criteria: The most expensive available model is used for all tasks uniformly, including trivially simple operations that a cheaper model handles equally well.
Skip (N/A) when: The application has a single AI use case that genuinely requires a frontier model, and there are no simple subtasks that could be delegated. Signal: Only one AI call site exists in the entire codebase, handling one task type.
Detail on fail: "Frontier model used for all tasks including simple ones — 10-50x cost overhead on routine operations"

Remediation: Frontier models cost 10x to 50x more per token than their lighter counterparts and are often no better at simple tasks. Routing by complexity is one of the highest-ROI optimizations in AI applications.

// src/lib/ai/models.ts
export const TASK_MODELS = {
  // Simple tasks — fast, cheap
  titleGeneration: "gpt-4o-mini",
  classification: "gpt-4o-mini",
  summarization: "gpt-4o-mini",
  sentimentAnalysis: "gpt-4o-mini",

  // Complex tasks — full reasoning capability
  codeGeneration: "gpt-4o",
  complexReasoning: "gpt-4o",
  multiStepPlanning: "gpt-4o",
} as const;

// src/app/api/title/route.ts
import { TASK_MODELS } from "@/lib/ai/models";

const { text } = await generateText({
  model: openai(TASK_MODELS.titleGeneration),
  prompt: `Generate a concise title for: ${content}`,
  maxTokens: 50,
});

Verify by reviewing each API route's model parameter and confirming that trivially simple tasks are not using frontier models.

External references

owasp-llm:2025 · LLM10 — Unbounded Consumption
nist-ai-rmf:1.0 · MAP 5.1 — AI system likelihood and impact of risks are mapped
iso-25010:2011 · performance-efficiency.resource-utilization — Resource Utilization — model tier matched to task complexity

Taxons

cost-efficiency

History

2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated