robots.txt exists and is properly configured

ab-002508 · seo-fundamentals.discoverability.robots-txt

Severity: highactive

Why it matters

A missing robots.txt means crawlers operate without guidance: Googlebot and other search engines will crawl everything by default, including staging pages, admin routes, or duplicate parameter URLs that dilute crawl budget and can create duplicate-content penalties. A misconfigured Disallow: / blocks all crawlers from indexing the entire site, which causes complete de-indexing — the site disappears from search results within days. RFC 9309 formalizes the robots.txt protocol; non-compliance means crawler behavior is undefined across different search engines.

Severity rationale

High because a missing robots.txt wastes crawl budget on non-canonical URLs, and a `Disallow: /` misconfiguration causes complete search engine de-indexing of the site.

Remediation

Create public/robots.txt with at minimum an allow-all rule and a sitemap reference:

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

In Next.js App Router, generate it programmatically from app/robots.ts so the sitemap URL stays in sync with your environment:

// app/robots.ts
import type { MetadataRoute } from 'next'

export default function robots(): MetadataRoute.Robots {
  return {
    rules: { userAgent: '*', allow: '/', disallow: ['/admin/', '/api/'] },
    sitemap: `${process.env.NEXT_PUBLIC_SITE_URL}/sitemap.xml`,
  }
}

Verify the live output at https://yoursite.com/robots.txt after deploying. Never set Disallow: / in production.

Detection

ID: robots-txt
Severity: high
What to look for: Check for public/robots.txt (static) or an API route/config that generates robots.txt (e.g., app/robots.ts in Next.js App Router). Verify it contains at least a User-agent directive.
Pass criteria: A robots.txt file exists (static or generated) with at least 1 User-agent directive and at least 1 Allow or non-blanket Disallow rule. The file must not block all crawlers from all content (i.e., must not consist solely of Disallow: /). Count the number of directives in the file.
Fail criteria: No robots.txt file exists, or the robots.txt blocks all crawlers from the entire site (User-agent: * / Disallow: /).
Skip (N/A) when: Never — every public web project should have a robots.txt.
Detail on fail: "No robots.txt found in public/ directory or as a generated route" or "robots.txt blocks all crawlers with 'Disallow: /'"
Cross-reference: For deeper crawlability, indexation, and robots.txt analysis, the Advanced SEO audit covers this in detail.

Remediation: robots.txt tells search engine crawlers which parts of your site to index. Create a public/robots.txt:

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

In Next.js App Router, you can generate it dynamically:

// app/robots.ts
export default function robots() {
  return {
    rules: { userAgent: '*', allow: '/' },
    sitemap: 'https://yoursite.com/sitemap.xml',
  }
}

External references

external · RFC-9309 — Robots Exclusion Protocol

Taxons

findability

History

2026-04-18·v1.0.0·Initial import from seo-fundamentals·automated