robots.txt and sitemap are accessible

ab-002535 · site-health-check.seo-discoverability.robots-sitemap

Severity: infoactive

Why it matters

Googlebot, Bingbot, and every AI-crawler (GPTBot, ClaudeBot, PerplexityBot) check robots.txt first to learn what they're allowed to fetch, and they discover deep URLs through sitemap.xml. Missing both forces crawlers to spider links one hop at a time from the homepage, which means deep pages get crawled late or not at all. For GEO (Generative Engine Optimization), an unreachable sitemap means your pages don't make it into AI answer sets.

Severity rationale

Info because crawlers will eventually find most pages anyway, but missing files slow indexing and hurt AI discoverability.

Remediation

Ship robots.txt and sitemap.xml at the site root. For Next.js, generate the sitemap from a route handler at app/sitemap.ts and app/robots.ts so URLs stay in sync with routes. For static sites, place these files in public/:

User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml

Verify both return HTTP 200 in production, not a SPA-rewritten HTML response.

Detection

ID: robots-sitemap
Severity: info
What to look for: Check the HTTP status code of {BASE}/robots.txt (must be 200). Check the HTTP status code of {BASE}/sitemap.xml (must be 200). If sitemap.xml returns non-200, also parse the robots.txt body for a Sitemap: directive referencing an alternate sitemap URL. Count all sitemap references found across both the default path and robots.txt directives. Count how many of these 2 resources (robots.txt, sitemap) are accessible.
Pass criteria: Both of the following 2 conditions are met: (a) /robots.txt returns HTTP 200, AND (b) either /sitemap.xml returns HTTP 200 or the robots.txt body contains at least 1 Sitemap: directive pointing to a valid URL. Report which resources were found and their HTTP status codes.
Fail criteria: Either /robots.txt returns a non-200 status, or no sitemap is discoverable (neither /sitemap.xml returns 200 nor robots.txt references a sitemap URL).
Skip (N/A) when: The site is a single-page application with no multi-page content to index (detected via SPA Detection above).
Cross-reference: For comprehensive crawlability analysis including XML sitemap structure and robots.txt directives, the SEO Advanced audit (seo-advanced) provides deeper coverage.
Detail on fail: "robots.txt returns 404" or "No sitemap.xml found and no sitemap referenced in robots.txt"
Remediation: Search engines use robots.txt to understand crawl permissions and sitemap.xml to discover pages. Create both files in your public root:
```
# public/robots.txt
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
```
For Next.js, generate a sitemap with next-sitemap package or the built-in app/sitemap.ts route. For static sites, use a build plugin or create sitemap.xml manually listing all page URLs.

Taxons

findability

History

2026-04-18·v1.0.0·Initial import from site-health-check·automated

Why it matters

Remediation

User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml

Verify both return HTTP 200 in production, not a SPA-rewritten HTML response.

Detection

ID: robots-sitemap
Severity: info
What to look for: Check the HTTP status code of {BASE}/robots.txt (must be 200). Check the HTTP status code of {BASE}/sitemap.xml (must be 200). If sitemap.xml returns non-200, also parse the robots.txt body for a Sitemap: directive referencing an alternate sitemap URL. Count all sitemap references found across both the default path and robots.txt directives. Count how many of these 2 resources (robots.txt, sitemap) are accessible.
Pass criteria: Both of the following 2 conditions are met: (a) /robots.txt returns HTTP 200, AND (b) either /sitemap.xml returns HTTP 200 or the robots.txt body contains at least 1 Sitemap: directive pointing to a valid URL. Report which resources were found and their HTTP status codes.
Fail criteria: Either /robots.txt returns a non-200 status, or no sitemap is discoverable (neither /sitemap.xml returns 200 nor robots.txt references a sitemap URL).
Skip (N/A) when: The site is a single-page application with no multi-page content to index (detected via SPA Detection above).
Cross-reference: For comprehensive crawlability analysis including XML sitemap structure and robots.txt directives, the SEO Advanced audit (seo-advanced) provides deeper coverage.
Detail on fail: "robots.txt returns 404" or "No sitemap.xml found and no sitemap referenced in robots.txt"
Remediation: Search engines use robots.txt to understand crawl permissions and sitemap.xml to discover pages. Create both files in your public root:
```
# public/robots.txt
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
```
For Next.js, generate a sitemap with next-sitemap package or the built-in app/sitemap.ts route. For static sites, use a build plugin or create sitemap.xml manually listing all page URLs.