Cloudflare Turnstile, reCAPTCHA, and user-agent-sniffing middleware that challenge every request will block OpenAI's GPTBot, Google-Extended, PerplexityBot, and ClaudeBot alongside malicious traffic. When your blog and docs return a JavaScript challenge instead of HTML, AI systems record the page as uncrawlable and stop retrying — your content vanishes from generative answers even though it loads fine for humans.
Medium because impact is conditional on how broadly the bot challenge is scoped across routes.
Restrict bot protection middleware to API, auth, and form-submission routes using an explicit matcher, and leave content pages unprotected at the application layer. If a CDN or edge firewall enforces challenges, allowlist known AI crawler user agents or exclude content-only paths. Update middleware.ts.
export const config = {
matcher: ['/api/:path*', '/auth/:path*'],
}
ID: geo-readiness.ai-crawler-access.no-aggressive-bot-blocking
Severity: medium
What to look for: Count all bot-blocking mechanisms in the codebase: imports of CAPTCHA libraries (reCAPTCHA, hCaptcha, Cloudflare Turnstile), middleware files that inspect user agents, and any "verify you're human" interstitial components. For each mechanism found, determine which routes it applies to — content pages vs. API/auth routes. Bot protection on API routes, auth pages, or form submissions is acceptable and expected.
Pass criteria: Count all CAPTCHA, JavaScript challenge, and bot verification mechanisms. The count of such mechanisms on public content pages (homepage, marketing, blog, docs) must be 0. Bot protection scoped only to API routes, auth flows, or form submissions passes. If no bot-blocking code is found in the codebase at all, pass — note that CDN-level configuration (Cloudflare, Vercel) cannot be verified from code alone. Report even on pass: "Found X bot-blocking mechanisms total — 0 apply to public content pages."
Fail criteria: At least 1 CAPTCHA or challenge mechanism applies to public content page routes. Middleware that blocks or challenges requests based on user agent string for all routes including content pages. Report: "X bot-blocking mechanisms found on content routes: [list mechanisms and affected routes]".
Skip (N/A) when: Never.
Detail on fail: "Cloudflare Turnstile challenge applied via middleware to all routes including /blog and /docs — AI crawlers cannot access 2 content sections" or "reCAPTCHA gate on homepage prevents automated content access — 1 bot-blocking mechanism on 1 content route"
Remediation: Bot protection should target form submissions and API endpoints, not content pages. Scope your middleware to exclude public content routes:
// middleware.ts — exclude content pages from bot challenges
export const config = {
matcher: ['/api/:path*', '/auth/:path*'], // Only protect API and auth
}
CDN-level bot protection (Cloudflare, Vercel Firewall) should be configured to allow known AI crawler user agents or to skip challenges for content-only pages.