Skip to main content

Sitemap URL is referenced in robots.txt

ab-002506 · seo-fundamentals.discoverability.sitemap-in-robots
Severity: mediumactive

Why it matters

The Sitemap: directive in robots.txt is how crawlers you have never heard of — Bing, Yandex, DuckDuckGo, Perplexity, ChatGPT's crawler, and every GEO/AI indexer — discover your sitemap without you submitting it through a dashboard. Google Search Console covers Googlebot alone. If the directive is missing, those crawlers fall back to link discovery only, and any orphaned route never gets indexed outside Google. One line of config unlocks distribution across every search engine and AI system that obeys the standard.

Severity rationale

Medium because Google still finds the sitemap via Search Console, but non-Google crawlers and AI indexers miss it entirely.

Remediation

Append a Sitemap: line to your robots.txt with the absolute HTTPS URL of your sitemap. If you are on Next.js, create app/robots.ts so the directive stays in version control alongside the sitemap.

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

If you ship multiple sitemaps (for example, one per content type), include one Sitemap: directive per file — the standard allows multiple lines.

Detection

  • ID: seo-fundamentals.discoverability.sitemap-in-robots

  • Severity: medium

  • What to look for: If both robots.txt and sitemap.xml exist, check whether the robots.txt contains a Sitemap: directive pointing to the sitemap URL.

  • Pass criteria: Count all Sitemap: directives in the robots.txt file. The robots.txt file must contain at least 1 Sitemap: line with a valid URL pointing to the sitemap. The URL must begin with https:// or http:// and end with a sitemap file extension (.xml, .xml.gz).

  • Fail criteria: robots.txt exists and sitemap exists, but robots.txt contains 0 Sitemap: directives referencing the sitemap.

  • Skip (N/A) when: Either robots.txt or sitemap does not exist (those are caught by their own checks).

  • Detail on fail: "robots.txt exists but does not include a Sitemap: directive. Search engines may not discover your sitemap automatically."

  • Remediation: Add a Sitemap: directive to your robots.txt:

    Sitemap: https://yoursite.com/sitemap.xml
    

    This helps search engines find your sitemap without relying on Google Search Console or other manual submission.

Taxons

History