Blocking AI crawlers in robots.txt cuts your content out of every AI-powered search engine, citation engine, and LLM knowledge base that respects the robots-txt-spec. When GPTBot, ClaudeBot, or Google-Extended can't crawl your pages, your content doesn't appear in ChatGPT answers, Perplexity citations, or AI Overviews — regardless of how good that content is. For SaaS and content businesses, this directly suppresses top-of-funnel discovery from the fastest-growing traffic source. Even blocking one of the 10 major AI crawler user agents creates a measurable gap in your AI findability coverage.
Critical because even a single blocked AI crawler user agent fully excludes your content from that platform's citation and knowledge retrieval pipeline.
Remove Disallow rules targeting any of the 10 major AI crawler user agents from public/robots.txt or app/robots.ts. If you generated the file dynamically, audit the rules array for any entry whose userAgent matches GPTBot, ClaudeBot, PerplexityBot, CCBot, or similar.
// app/robots.ts
export default function robots() {
return {
rules: [
// Single wildcard rule is sufficient — do NOT add per-bot Disallow entries
{ userAgent: '*', allow: '/', disallow: ['/api/', '/auth/'] },
],
sitemap: 'https://example.com/sitemap.xml',
}
}
If you need to block AI training crawlers selectively, target only training-specific agents (e.g., CCBot) while explicitly allowing citation-oriented ones like GPTBot and ClaudeBot.
ID: geo-readiness.ai-crawler-access.ai-crawlers-not-blocked
Severity: critical
What to look for: Examine robots.txt (static file at public/robots.txt or generated via app/robots.ts or equivalent) for user agent rules targeting AI crawlers. Count all User-agent / Disallow rule pairs in the file. The 10 major AI crawler user agents are: GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, ClaudeBot-User, PerplexityBot, Bytespider, CCBot, Google-Extended, FacebookBot (used for AI training). For each of the 10 major crawlers, check for both explicit Disallow rules and absence of Allow rules when a general Disallow: / is set. Report: "X of 10 major AI crawlers are blocked."
Pass criteria: Count all Disallow rules that target major AI crawler user agents. No more than 0 of the 10 major AI crawler user agents should be blocked — robots.txt has no Disallow rules targeting any of the 10 major AI crawler user agents listed above. A generic User-agent: * / Allow: / without AI-specific blocks passes. Blocking a single non-major bot while allowing the 10 major ones listed above also passes. Do NOT pass when even 1 of the 10 major AI crawlers has a Disallow rule — partial blocking still degrades AI visibility.
Fail criteria: robots.txt contains User-agent: GPTBot / Disallow: / or similar blanket block for any of the 10 major AI crawlers listed above. Blocking even 1 major AI crawler is a fail — detail should name which crawler is blocked. A generic Disallow: / for all user agents also fails (blocks AI crawlers along with everything else). Report: "X of 10 major AI crawlers blocked: [list names]".
Skip (N/A) when: No robots.txt found — this is caught by the SEO Fundamentals audit. Note in detail: "No robots.txt found — cannot verify AI crawler access. See SEO Fundamentals audit."
Detail on fail: Name the specific AI crawlers being blocked. Example: "2 of 10 major AI crawlers blocked: GPTBot and CCBot — both have Disallow: / rules, preventing all content indexing" or "1 of 10 major AI crawlers blocked: Google-Extended — prevents Google's AI systems from using content"
Cross-reference: For a deeper analysis of robots.txt structure, crawlability rules, and sitemap configuration, the Advanced SEO audit covers this in detail.
Remediation: AI crawlers need access to your content to cite it. If you've blocked AI crawlers in robots.txt, remove the blocking rules:
# Allow AI crawlers (remove any Disallow rules for these)
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
In Next.js App Router with app/robots.ts, ensure your rules don't target AI user agents:
export default function robots() {
return {
rules: [
{ userAgent: '*', allow: '/', disallow: ['/api', '/auth'] },
// Do NOT add disallow rules for GPTBot, ClaudeBot, etc.
],
}
}
If you have privacy concerns about AI training, consider allowing crawlers for search/citation while blocking training-specific ones selectively — but understand this reduces your AI visibility.