All 20 checks with why-it-matters prose, severity, and cross-references to related audits.
Without progressive streaming, users stare at a frozen interface for the entire generation window, which routinely runs ten to forty seconds on long-context prompts. Abandonment rates on non-streaming chat interfaces regularly exceed fifty percent after ten seconds of dead air. Streaming is the single largest perceived-performance lever in any AI chat product and directly maps to user-experience outcomes; a non-streaming interface feels broken even when the underlying model latency is identical to a streaming competitor.
Why this severity: Critical because a non-streaming chat interface is the dominant cause of abandonment and perceived product failure in AI apps.
ai-chat-visibility.response-display.response-streaming-progressiveSee full patternAI models produce markdown-formatted output by default: bold, italics, bulleted lists, headers, and inline code are the native vocabulary of modern LLM responses. Rendering that output as plain text means users see raw `**asterisks**` and `## hashes` cluttering every answer, destroying readability and forcing users to mentally parse formatting tokens. This also degrades accessibility by collapsing semantic structure into an undifferentiated wall of text, failing the intent of WCAG 2.2 SC 1.3.1 (Info and Relationships).
Why this severity: High because unreadable output is functionally broken even when the model response is technically correct.
ai-chat-visibility.response-display.markdown-renderingSee full patternCode is one of the highest-value outputs of an AI chat product, and unstyled code is close to useless: variable names, keywords, and strings blur together, indentation collapses when copied, and language cues disappear. Developer-facing AI tools lose the entire code-review, debug, and learning use case without syntax highlighting. A monospace font alone is not enough; colorized tokens are what make code visually scannable and differentiate the product from a plain text box.
Why this severity: High because unhighlighted code materially degrades every developer-facing use case of the product.
ai-chat-visibility.response-display.code-block-formattingSee full patternFixed-height message bubbles silently amputate long AI responses, hiding answers that the user paid tokens to generate. Users have no indication that content is missing, which creates a trust problem: they believe the AI gave a short or incomplete answer when in fact the UI is simply clipping it. This is especially damaging for technical content like long code blocks, step-by-step instructions, or multi-paragraph analysis where the hidden portion often contains the conclusion or the critical step.
Why this severity: High because clipped content creates silent data loss from the user's perspective and undermines trust in every response.
ai-chat-visibility.response-display.long-response-scrollSee full patternIn any non-trivial conversation the user will scroll up to re-read earlier answers, copy code, or check context, and without a scroll-to-bottom affordance they miss new streaming responses entirely. Messages arrive silently off-screen and the user assumes the AI is still thinking or has failed. The industry convention established by ChatGPT, Claude, and Gemini is either auto-scroll on new message or a floating jump-to-latest button, and users have learned to expect it.
Why this severity: Low because users can manually scroll, but the friction compounds across every long conversation in the product.
ai-chat-visibility.response-display.scroll-to-bottom-buttonSee full patternUsers copy AI responses constantly — to paste code into an editor, share an answer in a work chat, or archive an explanation. Forcing manual text selection is a per-message papercut that accumulates across the lifetime of the product, and on mobile it is actively painful: text selection on long messages frequently selects the wrong range or triggers system menus. A per-message copy button is table-stakes for any chat interface that expects repeat use.
Why this severity: Low because the workaround is manual selection, but the absence is immediately noticed against every reference product in the category.
ai-chat-visibility.response-display.copy-response-buttonSee full patternBetween message submit and the first streamed token there is typically a one-to-three-second first-token latency, and with no loading indicator the UI appears frozen. Users commonly click send again, refresh the page, or abandon the session entirely, each of which either wastes tokens or loses the user. A visible typing indicator also serves an accessibility role via `role="status"` announcements, satisfying WCAG 2.2 SC 4.1.3 (Status Messages) for screen reader users.
Why this severity: High because the first-token latency window is long enough that a frozen UI causes duplicate submits and session abandonment.
ai-chat-visibility.loading-and-streaming.typing-indicatorSee full patternWithout a stop button the user is held hostage by every generation they regret starting: a prompt that is clearly going off the rails, a response already too long, a hallucination cascading into more hallucinations. They cannot terminate the stream, cannot re-prompt, and the product burns their tokens and wall-clock time on output they will immediately delete. This is one of the most-cited usability gaps in AI chat products and directly affects token-spend efficiency for paying users.
Why this severity: High because users cannot recover from a bad generation and the product wastes their tokens on discarded output.
ai-chat-visibility.loading-and-streaming.stop-generation-buttonSee full patternConfiguring streaming at the API level but awaiting the full response on the frontend is a silent no-op: the server dutifully streams tokens into a reader that buffers them all before a single setState call, producing the exact frozen-UI experience that streaming was supposed to eliminate. This bug pattern is extremely common when developers half-migrate from non-streaming to streaming and forget to update the consuming hook or component state logic.
Why this severity: High because the server cost of streaming is paid while delivering zero perceived-performance benefit to users.
ai-chat-visibility.loading-and-streaming.streaming-incrementalSee full patternAn always-enabled send button during generation lets users submit multiple messages in rapid succession, producing race conditions where two in-flight completions interleave in the UI, one overwrites the other's state, or both consume tokens against the same conversation. This also hides the current app state from the user, who has no visual signal that a generation is already in progress and assumes their first click did not register.
Why this severity: Low because the damage is mostly wasted tokens and occasional UI glitches rather than data loss or security impact.
ai-chat-visibility.loading-and-streaming.input-disabled-during-generationSee full patternWhen an application enforces per-user message quotas or subscription-tier rate limits but shows no indicator, users experience silent, opaque walls — a request goes out, nothing comes back, and there is no signal explaining why. This erodes trust and generates support tickets that cost real time. The iso-25010:2011 performance-efficiency dimension treats predictability as a quality attribute: users cannot manage their own consumption if the system conceals it. Cost-efficiency also degrades on the operator side when users retry aggressively against an invisible limit, amplifying API spend without delivering additional value.
Why this severity: Informational because the failure harms user experience and increases support load but does not expose data or create a security vulnerability.
ai-chat-visibility.loading-and-streaming.token-usage-indicatorSee full patternA silently frozen chat UI after an API failure leaves users without any signal: they do not know whether to retry, wait, or refresh. Worse, mishandled error paths frequently leak raw error objects — stack traces, internal route names, or upstream provider error messages — directly into the rendered output, which CWE-209 (Generation of Error Message Containing Sensitive Information) identifies as an information-disclosure risk. In subscription products, a broken interface with no error message or retry affordance is an immediate churn signal. The combination of user confusion, potential data exposure, and broken UI state makes silent failure the worst possible error-handling outcome.
Why this severity: Critical because a missing error handler can freeze the UI with `isLoading` stuck as true and may expose raw error objects containing sensitive internal details to end users.
ai-chat-visibility.error-handling.api-error-user-messageSee full patternRate limit errors are one of the most common failure modes in AI chat products — per-user quotas, per-IP throttles, and upstream provider 429s all route through the same code path. Showing users a generic "Something went wrong" message for a 429 makes the product look broken when in fact the user has simply hit a quota and the correct action is to wait or upgrade. This is a direct conversion lever: rate-limited users on a paid tier should see an upgrade prompt, not an error toast.
Why this severity: High because rate limits are an expected, recurring failure mode and generic errors convert revocable friction into churn.
ai-chat-visibility.error-handling.rate-limit-messageSee full patternAI responses are non-deterministic and fail in soft ways all the time — hallucinations, cut-off streams, refusals on benign prompts, partial answers that drift off topic. Without a regenerate button the only recovery is to retype the prompt verbatim, which users will not do; they will instead give up on the answer or, worse, act on the bad one. Regenerate is the standard recovery affordance established by every reference chat product in the category.
Why this severity: High because soft-fail responses are common and the alternative recovery path (retyping) is friction users will not pay.
ai-chat-visibility.error-handling.regenerate-response-buttonSee full patternNetwork errors mid-stream — Wi-Fi dropouts, CDN hiccups, tunnel reconnects on mobile — will freeze the UI with `isLoading` stuck true if the error handler only catches HTTP-status errors and ignores `TypeError: Failed to fetch`. The user sees an animated typing indicator spinning forever with no recovery path short of a hard refresh, which also loses the partial response they had already generated. This is a routine failure mode on mobile networks and must be handled distinctly from API errors.
Why this severity: High because a stuck UI with no retry path is indistinguishable from a broken product and happens routinely on mobile.
ai-chat-visibility.error-handling.network-error-recoverySee full patternA single-line `input type="text"` makes it impossible to paste a multi-line code snippet, error trace, log excerpt, or long-form question — which is precisely the content users want to discuss with an AI. They will either truncate their question, paste it as one giant line that destroys formatting, or abandon the interaction. This is the most common prompt-quality issue in AI chat products and maps directly to lower response quality and lower user satisfaction.
Why this severity: High because the input element itself caps the maximum useful prompt quality the product can ever receive.
ai-chat-visibility.input-and-history.multiline-inputSee full patternThe Vercel AI SDK's `useChat` hook stores messages exclusively in React component state — a page refresh destroys the entire conversation with no recovery path. For any application where users build on prior context (debugging sessions, multi-step workflows, ongoing support threads), this is a reliability failure under iso-25010:2011: the system cannot maintain its service when subjected to a routine browser event. Users who lose history mid-workflow may lose irreplaceable problem context. Data-integrity also demands that committed user interactions not vanish silently; a conversation a user considers saved is, in effect, corrupted when it disappears.
Why this severity: Critical because in-memory-only storage guarantees total data loss on every page refresh, destroying user work with no recovery mechanism.
ai-chat-visibility.input-and-history.conversation-history-persistedSee full patternForcing users to mouse over to a send button for every message is both slow and jarring — it breaks the typing flow that everyone has been trained on by every messaging product in existence. Keyboard power users will bounce off a chat interface that requires a click per submit, and on tablet and laptop users the friction accumulates across hundreds of messages per session. This also affects accessibility: users who rely on keyboard-only navigation per WCAG 2.2 SC 2.1.1 (Keyboard) must have a keyboard submit path.
Why this severity: High because keyboard-only submit is both a productivity and an accessibility requirement and its absence is immediately felt.
ai-chat-visibility.input-and-history.keyboard-submitSee full patternWithout an empty state, a first-time user lands on a blank canvas with a single input box and no signal about what the AI can do, how to phrase a prompt, or what kinds of answers to expect. First-session activation — the user typing a first successful prompt — is the single most important conversion metric for any AI chat product, and a cold empty state is the largest drag on that metric. Example prompts also help set quality expectations and steer new users away from low-quality first attempts.
Why this severity: Low because returning users are unaffected, but first-session activation is disproportionately hurt by a cold empty canvas.
ai-chat-visibility.input-and-history.empty-state-guidanceSee full patternWhen the app enforces an input length limit via a `maxLength` attribute or server-side truncation, users have no way to see how close they are to the ceiling until their prompt is silently cut off or rejected. Long prompts — the exact high-intent content that benefits most from an AI — are the most likely to hit this limit, and silent truncation produces degraded or confusing responses that the user attributes to model quality rather than input handling. A visible counter at the threshold eliminates that failure mode entirely.
Why this severity: Low because the limit is usually large, but silent truncation of long prompts is especially damaging when it does occur.
ai-chat-visibility.input-and-history.input-character-countSee full patternRun this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.
Open AI Chat Visibility Audit