Skip to main content

robots.txt is present

ab-002590 · project-snapshot.seo.has-robots-txt
Severity: lowactive

Why it matters

robots.txt is the first URL every mainstream crawler hits on a domain — Googlebot, Bingbot, GPTBot, ClaudeBot, PerplexityBot, and every security scanner all request /robots.txt before touching any other page. When it is missing, crawlers default to "allow everything," which usually works out but silently loses the ability to point them at the sitemap (Sitemap: https://example.com/sitemap.xml) and to exclude admin, preview, or staging paths from indexing. A far more damaging failure this check catches is the AI-generated Disallow: / that deindexes the entire site: this happens when a scaffold template ships with a restrictive "development" robots file that nobody flips before launch, and it can take weeks to recover rankings after Google re-crawls. AI coding tools inconsistently generate this file — some include a permissive default, others omit it entirely, and a few inherit a framework boilerplate that blocks legitimate paths.

Severity rationale

Low because an absent `robots.txt` defaults to permissive crawling and does not block indexing, but a misconfigured one (`Disallow: /` left in from a template) can quietly deindex the entire site — which is why this check also flags over-blocking on pass.

Remediation

Create app/robots.ts:

export default function robots() {
  return { rules: [{ userAgent: '*', allow: '/' }], sitemap: 'https://example.com/sitemap.xml' }
}

Deeper remediation guidance and cross-reference coverage for this check lives in the seo-fundamentals Pro audit — run that after applying this fix for a more exhaustive pass on the same topic.

Detection

  • ID: project-snapshot.seo.has-robots-txt
  • Severity: low
  • What to look for: Check for app/robots.ts, app/robots.txt, public/robots.txt, or framework equivalents. The file should explicitly allow or disallow paths — not be empty.
  • Pass criteria: A robots file exists with at least one User-agent: directive and one Allow: or Disallow: line.
  • Fail criteria: Missing entirely, or present but empty.
  • Skip (N/A) when: Project is API / CLI / library.
  • Do NOT pass when: robots.txt blocks the entire site (Disallow: /) without a clear reason — quote the directive in the detail.
  • Report even on pass: "robots.txt at {path}; rules: {first User-agent and rule}."
  • Detail on fail: "No robots.txt or robots.ts found".
  • Remediation: Create app/robots.ts:
    export default function robots() {
      return { rules: [{ userAgent: '*', allow: '/' }], sitemap: 'https://example.com/sitemap.xml' }
    }
    

Taxons

History