Response metadata is available for debugging and observability
Why it matters
Without logging model name, token usage, and latency on AI calls, there is no operational visibility into cost trajectories, quality degradation, or performance regressions. A model version bump by the provider, a prompt change that doubles token usage, or a latency spike affecting user experience will go undetected until users report problems. NIST AI RMF MEASURE-2.7 requires that AI systems have mechanisms for ongoing performance measurement. ISO 25010:2011 maintainability requires that system behavior is observable. The observability infrastructure is low-effort to add and eliminates an entire class of invisible production failures.
Severity rationale
Info because the absence of observability does not directly cause user-visible failures, but it makes every other AI reliability issue — cost overruns, quality degradation, latency regressions — invisible until they become critical.
Remediation
Add minimal structured logging to every AI API call:
const t0 = Date.now()
const response = await openai.chat.completions.create({ model: process.env.OPENAI_MODEL ?? 'gpt-4o', ... })
console.log(JSON.stringify({
model: response.model,
promptTokens: response.usage?.prompt_tokens,
completionTokens: response.usage?.completion_tokens,
finishReason: response.choices[0]?.finish_reason,
latencyMs: Date.now() - t0
}))
For production, route this telemetry to Helicone, Braintrust, or LangSmith rather than console.log. Also ensure the model name is configured via process.env.OPENAI_MODEL rather than hardcoded so version changes are tracked through config rather than code diffs.
Detection
-
ID:
response-metadata-exposed -
Severity:
info -
What to look for: Enumerate all relevant files and Check whether the application captures and logs (server-side, not to client) AI response metadata — model name used, token counts (prompt_tokens, completion_tokens), latency, finish_reason, and request ID. Look for any observability setup: LangSmith, Helicone, Braintrust, OpenTelemetry integration with AI SDK, or custom logging of
response.usage. Check whether model name is hardcoded vs. configurable via environment variable. -
Pass criteria: Application logs at least model name, token usage, and latency server-side. Model name is configurable via environment variable or config rather than buried in code.
-
Fail criteria: No response metadata logging exists at all for a production application — impossible to debug cost, latency, or quality issues.
-
Skip (N/A) when: Application is a prototype or personal tool where operational observability is not yet a concern.
-
Detail on fail:
"No token usage or model metadata logged after AI calls in api/chat/route.ts — no observability into cost or performance"(max 500 chars) -
Remediation: Add minimal observability to AI calls:
const startTime = Date.now() const response = await openai.chat.completions.create({ ... }) console.log({ model: response.model, promptTokens: response.usage?.prompt_tokens, completionTokens: response.usage?.completion_tokens, finishReason: response.choices[0].finish_reason, latencyMs: Date.now() - startTime })For production, consider a dedicated AI observability platform (Helicone, Braintrust, or LangSmith) to track quality and cost over time.
External references
- iso-25010:2011 · maintainability — Analysability
- nist-ai-rmf:1.0 · MEASURE-2.7 — AI system performance or assurance criteria are measured
Taxons
History
- 2026-04-18·v1.0.0·Initial import from ai-response-quality·automated