Token counts are the primary cost driver for every AI API integration, but they are invisible by default if not explicitly captured. Without logging prompt_tokens, completion_tokens, and model name per request, you cannot answer basic operational questions: which feature costs the most, which users are consuming disproportionate tokens, whether a prompt change improved efficiency, or whether usage is trending toward a budget ceiling. NIST AI RMF MEASURE 2.5 requires monitoring AI system resource consumption. Unlogged token usage means you discover cost problems only on the monthly bill.
High because unlogged token usage makes cost monitoring, quota enforcement, and abuse detection operationally impossible — problems only surface on billing statements.
Capture the usage object from every AI response and persist it alongside the request. Include model name so you can break down costs by model tier.
// src/app/api/chat/route.ts
const result = await generateText({
model: openai("gpt-4o"),
messages,
maxTokens: 1000,
});
await db.insert("ai_request_logs").values({
session_id: sessionId,
user_id: userId ?? null,
model: "gpt-4o",
prompt_tokens: result.usage.promptTokens,
completion_tokens: result.usage.completionTokens,
total_tokens: result.usage.totalTokens,
created_at: new Date(),
});
If a database table is premature, use structured console logging: console.log(JSON.stringify({ event: "ai_request", model, ...result.usage })). Verify by making an AI request and checking the log output for a record with token counts.
ID: ai-token-optimization.caching-cost.token-usage-logging
Severity: high
What to look for: Check whether the usage object from AI API responses is captured and persisted. In the OpenAI SDK, this is completion.usage.prompt_tokens and completion.usage.completion_tokens. In the Vercel AI SDK, usage.promptTokens and usage.completionTokens are available on the result object from generateText. Look for these fields being written to a database table, sent to an analytics service (PostHog, Datadog, Segment, Axiom), or logged to a structured logging output. Also check for log statements that include token counts. Count all instances found and enumerate each.
Pass criteria: Token usage statistics (prompt_tokens, completion_tokens, total_tokens) are captured from every AI response and persisted or forwarded to a monitoring destination. The model name is also logged alongside token counts. At least 1 implementation must be confirmed.
Fail criteria: The usage object returned by the AI API is not read, discarded, or never referenced. Token consumption is invisible to the application after the call completes.
Skip (N/A) when: No AI API integration is detected.
Signal: No AI SDK dependencies in package.json.
Detail on fail: "Token usage not logged — no visibility into cost per request or per user"
Remediation: You cannot manage costs you cannot see. Token logging is the prerequisite for every downstream optimization: identifying expensive features, setting user quotas, detecting abuse, and forecasting bills.
// src/app/api/chat/route.ts
const result = await generateText({
model: openai("gpt-4o"),
messages,
maxTokens: 1000,
});
// Log usage immediately after the call
await db.insert("ai_request_logs").values({
session_id: sessionId,
user_id: userId ?? null,
model: "gpt-4o",
prompt_tokens: result.usage.promptTokens,
completion_tokens: result.usage.completionTokens,
total_tokens: result.usage.totalTokens,
created_at: new Date(),
});
Alternatively, use structured console logging if a full database table is premature:
console.log(JSON.stringify({
event: "ai_request",
model: "gpt-4o",
prompt_tokens: result.usage.promptTokens,
completion_tokens: result.usage.completionTokens,
}));
Verify by making an AI request and checking the database or log output for a record containing token counts.