Multi-modal input support is implemented when advertised

ab-000333 · ai-ux-patterns.core-interaction.multi-modal-input

Severity: mediumactive

Why it matters

A paperclip or microphone icon in your chat input is a promise. When the user clicks it and nothing happens — or when your marketing copy says "ask questions about your screenshots" but the image payload is never wired into the API call — trust collapses faster than if the feature had never been advertised. Hallucinated affordances are the UI equivalent of a broken link in the nav, and they are especially damaging on AI products where users are already uncertain about capability boundaries.

Severity rationale

Medium because it erodes trust and triggers churn but does not directly expose data or break the core text flow.

Remediation

Either remove the dead affordance or wire it up completely. If you intend to ship multi-modal later, disable the icon with a disabled attribute and a tooltip ("Image input coming soon") rather than render a clickable element with no handler. For a working image path with the Vercel AI SDK, pass experimental_attachments on append() in src/components/chat/composer.tsx.

await append({ role: 'user', content: text, experimental_attachments: attachments })

Detection

ID: multi-modal-input
Severity: medium
What to look for: Count all UI affordances that imply multi-modal input: file attachment icons, camera icons, microphone icons, drag-and-drop zones, <input type="file"> elements. For each affordance found, enumerate whether it has a working handler (click handler, file processing logic, multipart API payload construction). Cross-reference with the AI provider SDK being used — if the model supports vision, check whether image payloads are correctly structured in API calls. Extract and quote any marketing copy mentioning image, file, or voice capabilities.
Pass criteria: Either (a) no multi-modal features are mentioned or implied in any user-facing copy, OR (b) multi-modal features are mentioned and a corresponding implementation exists in the codebase. At least 100% of advertised modalities must have working implementations.
Fail criteria: The application's copy or UI (e.g., a paperclip icon with no handler, an "upload image" placeholder) implies multi-modal support but no actual implementation is found.
Skip (N/A) when: The project is a pure text-only AI application with no mention of image, file, or voice capabilities in any user-facing UI or marketing copy.
Detail on fail: "UI shows a file attachment icon in the chat input but no file input handler or multipart API payload construction found".

Remediation: If multi-modal features are planned but not ready, remove or clearly disable the UI affordance. A paperclip icon that does nothing erodes trust faster than no icon at all.

For image input with the Vercel AI SDK:

const [attachments, setAttachments] = useState<FileAttachment[]>([])

<input
  type="file"
  accept="image/*"
  onChange={(e) => {
    const file = e.target.files?.[0]
    if (file) setAttachments(prev => [...prev, { file, url: URL.createObjectURL(file) }])
  }}
/>

// Include in message submission
await append({ role: 'user', content: userText, experimental_attachments: attachments })

Taxons

placeholder-hygiene user-experience

History

2026-04-18·v1.0.0·Initial import from ai-ux-patterns·automated

Why it matters

Remediation

await append({ role: 'user', content: text, experimental_attachments: attachments })

Detection

ID: multi-modal-input
Severity: medium
What to look for: Count all UI affordances that imply multi-modal input: file attachment icons, camera icons, microphone icons, drag-and-drop zones, <input type="file"> elements. For each affordance found, enumerate whether it has a working handler (click handler, file processing logic, multipart API payload construction). Cross-reference with the AI provider SDK being used — if the model supports vision, check whether image payloads are correctly structured in API calls. Extract and quote any marketing copy mentioning image, file, or voice capabilities.
Pass criteria: Either (a) no multi-modal features are mentioned or implied in any user-facing copy, OR (b) multi-modal features are mentioned and a corresponding implementation exists in the codebase. At least 100% of advertised modalities must have working implementations.
Fail criteria: The application's copy or UI (e.g., a paperclip icon with no handler, an "upload image" placeholder) implies multi-modal support but no actual implementation is found.
Skip (N/A) when: The project is a pure text-only AI application with no mention of image, file, or voice capabilities in any user-facing UI or marketing copy.
Detail on fail: "UI shows a file attachment icon in the chat input but no file input handler or multipart API payload construction found".

Remediation: If multi-modal features are planned but not ready, remove or clearly disable the UI affordance. A paperclip icon that does nothing erodes trust faster than no icon at all.

For image input with the Vercel AI SDK:

const [attachments, setAttachments] = useState<FileAttachment[]>([])

<input
  type="file"
  accept="image/*"
  onChange={(e) => {
    const file = e.target.files?.[0]
    if (file) setAttachments(prev => [...prev, { file, url: URL.createObjectURL(file) }])
  }}
/>

// Include in message submission
await append({ role: 'user', content: userText, experimental_attachments: attachments })