An abort/cancel mechanism exists for in-flight AI requests

ab-000326 · ai-token-optimization.streaming-performance.abort-cancel-mechanism

Severity: highactive

Why it matters

When a user navigates away mid-generation or realizes their prompt was wrong, the model continues generating and billing you for every token — even though no user will ever read the output. Without an abort mechanism, you pay for the full completion on every abandoned session. On a model that generates 1,000-token responses at $0.01 per completion, even a 5% abandonment rate on 10,000 daily requests costs $50/day in tokens that were never seen. OWASP LLM10, CWE-770, and NIST AI RMF MANAGE 1.3 all require application-level controls to terminate runaway resource consumption.

Severity rationale

High because without abort support, navigating away or canceling a request does not stop generation — the model keeps billing for output no user will ever see.

Remediation

Wire the stop() function from useChat to a visible cancel button that appears while generation is in progress. The Vercel AI SDK propagates the abort signal to the provider automatically.

// src/components/chat.tsx
import { useChat } from "ai/react";

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, stop } = useChat();
  return (
    <form onSubmit={handleSubmit}>
      {messages.map(m => <div key={m.id}>{m.content}</div>)}
      <input value={input} onChange={handleInputChange} disabled={isLoading} />
      {isLoading
        ? <button type="button" onClick={stop}>Stop generating</button>
        : <button type="submit">Send</button>
      }
    </form>
  );
}

For manual implementations using the OpenAI SDK directly, pass signal: controller.signal to chat.completions.create and call controller.abort() on cancel. Verify by starting a long generation, clicking Stop, and confirming server logs show a partial token count rather than the full expected output.

Detection

ID: abort-cancel-mechanism
Severity: high
What to look for: In the frontend, check for a "Stop" or "Cancel" button in chat components that calls stop() from useChat, or manually calls controller.abort() on an AbortController instance. In the API route, check whether the request signal is passed to the AI SDK call (abortSignal: req.signal or the SDK automatically using request signal). Also check whether the backend continues generating after a client disconnect (no signal propagation = tokens wasted after the user navigates away). Count all instances found and enumerate each.
Pass criteria: Users can cancel an in-progress AI generation, AND the backend AI SDK call is terminated when the client disconnects or cancels. The Vercel AI SDK's useChat hook exposes a stop() function that handles this automatically when wired to a button. At least 1 implementation must be confirmed.
Fail criteria: No stop button exists in the UI, or the backend continues generating (and billing) even when the client closes the connection or navigates away.
Skip (N/A) when: All AI responses are very short (under 5 seconds to complete) such that a cancel mechanism provides no practical value. Signal: All AI calls demonstrably produce only short outputs (token counting shows average completions under 100 tokens), or all AI calls are in non-interactive background jobs.
Cross-reference: The streaming-partial-render check verifies the rendering pipeline that needs graceful degradation when abort is triggered.
Detail on fail: "No abort mechanism — model continues generating (and billing) after user navigates away"

Remediation: If a user navigates away mid-generation or realizes their prompt was wrong, the model should stop. Without abort support, you pay for every token generated regardless of whether it was ever seen.

// Frontend — expose stop() from useChat
import { useChat } from "ai/react";

export function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, stop } = useChat();

  return (
    <div>
      {messages.map(m => <div key={m.id}>{m.content}</div>)}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} disabled={isLoading} />
        {isLoading
          ? <button type="button" onClick={stop}>Stop generating</button>
          : <button type="submit">Send</button>
        }
      </form>
    </div>
  );
}

The Vercel AI SDK automatically propagates the abort signal from the client to the AI provider when stop() is called. For manual implementations:

// Manual AbortController
const controller = new AbortController();

const stream = await openai.chat.completions.create({
  model: "gpt-4o",
  messages,
  stream: true,
  signal: controller.signal,
});

// To cancel:
controller.abort();

Verify by starting a long generation, clicking Stop, and checking server logs — token usage for that request should show a partial count, not the full expected output count.

External references

owasp-llm:2025 · LLM10 — Unbounded Consumption
cwe · CWE-770 — Allocation of Resources Without Limits or Throttling
nist-ai-rmf:1.0 · MANAGE 1.3 — Responses to identified AI risks are developed

Taxons

cost-efficiency user-experience

History

2026-04-18·v1.0.0·Initial import from ai-token-optimization·automated