Response metadata is available for debugging and observability

ab-000216 · ai-response-quality.response-management.response-metadata-exposed

Severity: infoactive

Why it matters

Without logging model name, token usage, and latency on AI calls, there is no operational visibility into cost trajectories, quality degradation, or performance regressions. A model version bump by the provider, a prompt change that doubles token usage, or a latency spike affecting user experience will go undetected until users report problems. NIST AI RMF MEASURE-2.7 requires that AI systems have mechanisms for ongoing performance measurement. ISO 25010:2011 maintainability requires that system behavior is observable. The observability infrastructure is low-effort to add and eliminates an entire class of invisible production failures.

Severity rationale

Info because the absence of observability does not directly cause user-visible failures, but it makes every other AI reliability issue — cost overruns, quality degradation, latency regressions — invisible until they become critical.

Remediation

Add minimal structured logging to every AI API call:

const t0 = Date.now()
const response = await openai.chat.completions.create({ model: process.env.OPENAI_MODEL ?? 'gpt-4o', ... })

console.log(JSON.stringify({
  model: response.model,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
  finishReason: response.choices[0]?.finish_reason,
  latencyMs: Date.now() - t0
}))

For production, route this telemetry to Helicone, Braintrust, or LangSmith rather than console.log. Also ensure the model name is configured via process.env.OPENAI_MODEL rather than hardcoded so version changes are tracked through config rather than code diffs.

Detection

ID: response-metadata-exposed
Severity: info
What to look for: Enumerate all relevant files and Check whether the application captures and logs (server-side, not to client) AI response metadata — model name used, token counts (prompt_tokens, completion_tokens), latency, finish_reason, and request ID. Look for any observability setup: LangSmith, Helicone, Braintrust, OpenTelemetry integration with AI SDK, or custom logging of response.usage. Check whether model name is hardcoded vs. configurable via environment variable.
Pass criteria: Application logs at least model name, token usage, and latency server-side. Model name is configurable via environment variable or config rather than buried in code.
Fail criteria: No response metadata logging exists at all for a production application — impossible to debug cost, latency, or quality issues.
Skip (N/A) when: Application is a prototype or personal tool where operational observability is not yet a concern.
Detail on fail: "No token usage or model metadata logged after AI calls in api/chat/route.ts — no observability into cost or performance" (max 500 chars)

Remediation: Add minimal observability to AI calls:

const startTime = Date.now()
const response = await openai.chat.completions.create({ ... })

console.log({
  model: response.model,
  promptTokens: response.usage?.prompt_tokens,
  completionTokens: response.usage?.completion_tokens,
  finishReason: response.choices[0].finish_reason,
  latencyMs: Date.now() - startTime
})

For production, consider a dedicated AI observability platform (Helicone, Braintrust, or LangSmith) to track quality and cost over time.

External references

iso-25010:2011 · maintainability — Analysability
nist-ai-rmf:1.0 · MEASURE-2.7 — AI system performance or assurance criteria are measured

Taxons

observability

History

2026-04-18·v1.0.0·Initial import from ai-response-quality·automated