AI crawlers are not blocked in robots.txt
Why it matters
Blocking AI crawlers in robots.txt cuts your content out of every AI-powered search engine, citation engine, and LLM knowledge base that respects the robots-txt-spec. When GPTBot, ClaudeBot, or Google-Extended can't crawl your pages, your content doesn't appear in ChatGPT answers, Perplexity citations, or AI Overviews — regardless of how good that content is. For SaaS and content businesses, this directly suppresses top-of-funnel discovery from the fastest-growing traffic source. Even blocking one of the 10 major AI crawler user agents creates a measurable gap in your AI findability coverage.
Severity rationale
Critical because even a single blocked AI crawler user agent fully excludes your content from that platform's citation and knowledge retrieval pipeline.
Remediation
Remove Disallow rules targeting any of the 10 major AI crawler user agents from public/robots.txt or app/robots.ts. If you generated the file dynamically, audit the rules array for any entry whose userAgent matches GPTBot, ClaudeBot, PerplexityBot, CCBot, or similar.
// app/robots.ts
export default function robots() {
return {
rules: [
// Single wildcard rule is sufficient — do NOT add per-bot Disallow entries
{ userAgent: '*', allow: '/', disallow: ['/api/', '/auth/'] },
],
sitemap: 'https://example.com/sitemap.xml',
}
}
If you need to block AI training crawlers selectively, target only training-specific agents (e.g., CCBot) while explicitly allowing citation-oriented ones like GPTBot and ClaudeBot.
Detection
-
ID:
ai-crawlers-not-blocked -
Severity:
critical -
What to look for: Examine robots.txt (static file at
public/robots.txtor generated viaapp/robots.tsor equivalent) for user agent rules targeting AI crawlers. Count allUser-agent/Disallowrule pairs in the file. The 10 major AI crawler user agents are:GPTBot,OAI-SearchBot,ChatGPT-User,ClaudeBot,ClaudeBot-User,PerplexityBot,Bytespider,CCBot,Google-Extended,FacebookBot(used for AI training). For each of the 10 major crawlers, check for both explicitDisallowrules and absence ofAllowrules when a generalDisallow: /is set. Report: "X of 10 major AI crawlers are blocked." -
Pass criteria: Count all
Disallowrules that target major AI crawler user agents. No more than 0 of the 10 major AI crawler user agents should be blocked — robots.txt has noDisallowrules targeting any of the 10 major AI crawler user agents listed above. A genericUser-agent: * / Allow: /without AI-specific blocks passes. Blocking a single non-major bot while allowing the 10 major ones listed above also passes. Do NOT pass when even 1 of the 10 major AI crawlers has aDisallowrule — partial blocking still degrades AI visibility. -
Fail criteria: robots.txt contains
User-agent: GPTBot / Disallow: /or similar blanket block for any of the 10 major AI crawlers listed above. Blocking even 1 major AI crawler is a fail — detail should name which crawler is blocked. A genericDisallow: /for all user agents also fails (blocks AI crawlers along with everything else). Report:"X of 10 major AI crawlers blocked: [list names]". -
Skip (N/A) when: No robots.txt found — this is caught by the SEO Fundamentals audit. Note in detail:
"No robots.txt found — cannot verify AI crawler access. See SEO Fundamentals audit." -
Detail on fail: Name the specific AI crawlers being blocked. Example:
"2 of 10 major AI crawlers blocked: GPTBot and CCBot — both have Disallow: / rules, preventing all content indexing"or"1 of 10 major AI crawlers blocked: Google-Extended — prevents Google's AI systems from using content" -
Cross-reference: For a deeper analysis of robots.txt structure, crawlability rules, and sitemap configuration, the Advanced SEO audit covers this in detail.
-
Remediation: AI crawlers need access to your content to cite it. If you've blocked AI crawlers in robots.txt, remove the blocking rules:
# Allow AI crawlers (remove any Disallow rules for these) User-agent: GPTBot Allow: / User-agent: ClaudeBot Allow: /In Next.js App Router with
app/robots.ts, ensure your rules don't target AI user agents:export default function robots() { return { rules: [ { userAgent: '*', allow: '/', disallow: ['/api', '/auth'] }, // Do NOT add disallow rules for GPTBot, ClaudeBot, etc. ], } }If you have privacy concerns about AI training, consider allowing crawlers for search/citation while blocking training-specific ones selectively — but understand this reduces your AI visibility.
External references
- external · robots-txt-spec — Robots Exclusion Protocol (RFC 9309)
Taxons
History
- 2026-04-18·v1.0.0·Initial import from geo-readiness·automated