robots.txt and sitemap are accessible
Why it matters
Googlebot, Bingbot, and every AI-crawler (GPTBot, ClaudeBot, PerplexityBot) check robots.txt first to learn what they're allowed to fetch, and they discover deep URLs through sitemap.xml. Missing both forces crawlers to spider links one hop at a time from the homepage, which means deep pages get crawled late or not at all. For GEO (Generative Engine Optimization), an unreachable sitemap means your pages don't make it into AI answer sets.
Severity rationale
Info because crawlers will eventually find most pages anyway, but missing files slow indexing and hurt AI discoverability.
Remediation
Ship robots.txt and sitemap.xml at the site root. For Next.js, generate the sitemap from a route handler at app/sitemap.ts and app/robots.ts so URLs stay in sync with routes. For static sites, place these files in public/:
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
Verify both return HTTP 200 in production, not a SPA-rewritten HTML response.
Detection
-
ID:
robots-sitemap -
Severity:
info -
What to look for: Check the HTTP status code of
{BASE}/robots.txt(must be 200). Check the HTTP status code of{BASE}/sitemap.xml(must be 200). Ifsitemap.xmlreturns non-200, also parse therobots.txtbody for aSitemap:directive referencing an alternate sitemap URL. Count all sitemap references found across both the default path and robots.txt directives. Count how many of these 2 resources (robots.txt, sitemap) are accessible. -
Pass criteria: Both of the following 2 conditions are met: (a)
/robots.txtreturns HTTP 200, AND (b) either/sitemap.xmlreturns HTTP 200 or therobots.txtbody contains at least 1Sitemap:directive pointing to a valid URL. Report which resources were found and their HTTP status codes. -
Fail criteria: Either
/robots.txtreturns a non-200 status, or no sitemap is discoverable (neither/sitemap.xmlreturns 200 norrobots.txtreferences a sitemap URL). -
Skip (N/A) when: The site is a single-page application with no multi-page content to index (detected via SPA Detection above).
-
Cross-reference: For comprehensive crawlability analysis including XML sitemap structure and robots.txt directives, the SEO Advanced audit (
seo-advanced) provides deeper coverage. -
Detail on fail:
"robots.txt returns 404"or"No sitemap.xml found and no sitemap referenced in robots.txt" -
Remediation: Search engines use
robots.txtto understand crawl permissions andsitemap.xmlto discover pages. Create both files in your public root:# public/robots.txt User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xmlFor Next.js, generate a sitemap with
next-sitemappackage or the built-inapp/sitemap.tsroute. For static sites, use a build plugin or createsitemap.xmlmanually listing all page URLs.
Taxons
History
- 2026-04-18·v1.0.0·Initial import from site-health-check·automated