No accidental search-exclusion meta directives on public pages

ab-001520 · geo-readiness.ai-crawler-access.no-search-exclusion-meta-tags

Severity: highactive

Why it matters

The robots directives that actually remove pages from AI answers are the ordinary search ones. A stray noindex removes the page from the indexes AI products are built on — Google requires a page to be indexed and snippet-eligible for its AI features, OpenAI's docs say noindex is what keeps even a bare link out of ChatGPT, and Anthropic's docs say the tag tells its search partners not to send the page to Claude. nosnippet and max-snippet:0 are Google's documented levers for excluding content from AI Overviews and AI Mode, and Bing honors data-nosnippet for snippets and AI summaries. Vibe-coded apps frequently ship a noindex left over from a template or staging config on real marketing pages, which silently zeroes their AI-answer presence. The community noai/noimageai tags, by contrast, have no documented consumer among these vendors.

Severity rationale

High because a noindex on a public content page removes it from the search indexes every major AI answer surface is grounded on, with no warning anywhere in the product.

Remediation

Remove noindex, nosnippet, max-snippet:0, and data-nosnippet from public content pages — check each layout's metadata.robots export and any X-Robots-Tag header set in server config. Keep them where they're intentional (admin, auth, user-generated content, alternate-format assets like markdown twins whose HTML canonical is indexed).

// Public content page metadata — index normally
export const metadata = {
  robots: { index: true, follow: true },
}

If the exclusion is intentional, understand the documented consequence: per Google and Bing, the content is excluded from AI Overviews/AI Mode and Copilot AI summaries; per OpenAI and Anthropic, a noindexed page won't be sent to ChatGPT or Claude by their search providers.

Detection

ID: no-search-exclusion-meta-tags
Severity: high
What to look for: Search all page layouts, page files, and server/header config for robots directives that exclude content from search indexes or snippets: noindex, nosnippet, max-snippet:0, and page-wide data-nosnippet. In Next.js App Router, check metadata.robots exports and any X-Robots-Tag headers set in middleware, route handlers, or next.config. Before evaluating, extract and quote the exact robots metadata configuration from the root layout file, if present. Count all layout/page files that set robots directives and classify each as public-content vs intentionally-private.
Pass criteria: Count occurrences of noindex, nosnippet, max-snippet:0, and page-wide data-nosnippet on PUBLIC HTML content pages (homepage, marketing pages, blog, docs). The count must be 0. Directives scoped to non-public surfaces are correct practice and pass with a note: auth/admin/dashboard routes, user-generated content, error pages, and alternate-format assets (markdown or feed twins of pages whose HTML canonical is indexed). A positive max-snippet value (e.g. max-snippet: 160) is a legitimate length preference and passes. Report even on pass: "Scanned X layout/page files — 0 search-exclusion directives on public content pages; intentional noindex on [auth, dashboard, ...] (correct)."
Fail criteria: At least 1 noindex, nosnippet, max-snippet:0, or page-wide data-nosnippet found on a public-facing content page. Report: "Found [directive] on X public pages: [routes]".
Do NOT pass when: the root layout sets noindex (it cascades to every route), or an X-Robots-Tag: noindex header matches public content paths, even if individual pages look clean.
Skip (N/A) when: Never — applies to all web projects.
Detail on fail: State the documented consequence and acknowledge intentionality rather than assuming a mistake. Example: "Meta robots 'noindex' in root layout cascades to all routes — per Google this makes pages ineligible for Search and its AI features; per OpenAI/Anthropic the page won't be sent to ChatGPT or Claude by their search providers. If intentional, scope it to non-public routes." or "'nosnippet' on 3 marketing pages — Google documents this as excluding the content from AI Overviews; if intentional, note that it removes those pages from AI answers."
Note: The community noai / noimageai directives have no documented consumer among Google, OpenAI, Anthropic, Microsoft, or Perplexity — their presence or absence is not scored here.

Remediation: Remove the directives from public content pages, keep them on private surfaces:

// Remove from public pages
export const metadata = {
  robots: {
    index: true,
    follow: true,
  },
}

External references

external · google-ai-features — Google — AI features and your website (snippet controls govern AI features)
external · bing-data-nosnippet — Bing — data-nosnippet support (excluded from snippets and AI summaries)
external · openai-publishers-faq — OpenAI — Publishers and Developers FAQ (noindex behavior)

Taxons

findability

History

2026-04-18·v1.0.0·Initial import from geo-readiness·automated
2026-06-10·v1.1.0·Renamed from no-ai-blocking-meta-tags and retargeted: the old check policed `noai`/`noimageai`, which no vendor documents honoring. Now checks for accidental `noindex`, `nosnippet`, `max-snippet:0`, and `data-nosnippet` on public HTML content pages — the directives Google, Bing, OpenAI, and Anthropic actually document as removing content from AI answers. Old dotted name and bare slug preserved as aliases for telemetry compatibility.·by geo-first-party-alignment

Why it matters

Remediation

// Public content page metadata — index normally
export const metadata = {
  robots: { index: true, follow: true },
}

Detection

ID: no-search-exclusion-meta-tags
Severity: high
What to look for: Search all page layouts, page files, and server/header config for robots directives that exclude content from search indexes or snippets: noindex, nosnippet, max-snippet:0, and page-wide data-nosnippet. In Next.js App Router, check metadata.robots exports and any X-Robots-Tag headers set in middleware, route handlers, or next.config. Before evaluating, extract and quote the exact robots metadata configuration from the root layout file, if present. Count all layout/page files that set robots directives and classify each as public-content vs intentionally-private.
Pass criteria: Count occurrences of noindex, nosnippet, max-snippet:0, and page-wide data-nosnippet on PUBLIC HTML content pages (homepage, marketing pages, blog, docs). The count must be 0. Directives scoped to non-public surfaces are correct practice and pass with a note: auth/admin/dashboard routes, user-generated content, error pages, and alternate-format assets (markdown or feed twins of pages whose HTML canonical is indexed). A positive max-snippet value (e.g. max-snippet: 160) is a legitimate length preference and passes. Report even on pass: "Scanned X layout/page files — 0 search-exclusion directives on public content pages; intentional noindex on [auth, dashboard, ...] (correct)."
Fail criteria: At least 1 noindex, nosnippet, max-snippet:0, or page-wide data-nosnippet found on a public-facing content page. Report: "Found [directive] on X public pages: [routes]".
Do NOT pass when: the root layout sets noindex (it cascades to every route), or an X-Robots-Tag: noindex header matches public content paths, even if individual pages look clean.
Skip (N/A) when: Never — applies to all web projects.
Detail on fail: State the documented consequence and acknowledge intentionality rather than assuming a mistake. Example: "Meta robots 'noindex' in root layout cascades to all routes — per Google this makes pages ineligible for Search and its AI features; per OpenAI/Anthropic the page won't be sent to ChatGPT or Claude by their search providers. If intentional, scope it to non-public routes." or "'nosnippet' on 3 marketing pages — Google documents this as excluding the content from AI Overviews; if intentional, note that it removes those pages from AI answers."
Note: The community noai / noimageai directives have no documented consumer among Google, OpenAI, Anthropic, Microsoft, or Perplexity — their presence or absence is not scored here.

Remediation: Remove the directives from public content pages, keep them on private surfaces:

// Remove from public pages
export const metadata = {
  robots: {
    index: true,
    follow: true,
  },
}

External references

external · google-ai-features — Google — AI features and your website (snippet controls govern AI features)

external · bing-data-nosnippet — Bing — data-nosnippet support (excluded from snippets and AI summaries)

external · openai-publishers-faq — OpenAI — Publishers and Developers FAQ (noindex behavior)

History

2026-04-18·v1.0.0·Initial import from geo-readiness·automated

2026-06-10·v1.1.0·Renamed from no-ai-blocking-meta-tags and retargeted: the old check policed `noai`/`noimageai`, which no vendor documents honoring. Now checks for accidental `noindex`, `nosnippet`, `max-snippet:0`, and `data-nosnippet` on public HTML content pages — the directives Google, Bing, OpenAI, and Anthropic actually document as removing content from AI answers. Old dotted name and bare slug preserved as aliases for telemetry compatibility.·by geo-first-party-alignment