robots.txt exists and is properly configured
Why it matters
A missing robots.txt means crawlers operate without guidance: Googlebot and other search engines will crawl everything by default, including staging pages, admin routes, or duplicate parameter URLs that dilute crawl budget and can create duplicate-content penalties. A misconfigured Disallow: / blocks all crawlers from indexing the entire site, which causes complete de-indexing — the site disappears from search results within days. RFC 9309 formalizes the robots.txt protocol; non-compliance means crawler behavior is undefined across different search engines.
Severity rationale
High because a missing robots.txt wastes crawl budget on non-canonical URLs, and a `Disallow: /` misconfiguration causes complete search engine de-indexing of the site.
Remediation
Create public/robots.txt with at minimum an allow-all rule and a sitemap reference:
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
In Next.js App Router, generate it programmatically from app/robots.ts so the sitemap URL stays in sync with your environment:
// app/robots.ts
import type { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
return {
rules: { userAgent: '*', allow: '/', disallow: ['/admin/', '/api/'] },
sitemap: `${process.env.NEXT_PUBLIC_SITE_URL}/sitemap.xml`,
}
}
Verify the live output at https://yoursite.com/robots.txt after deploying. Never set Disallow: / in production.
Detection
-
ID:
robots-txt -
Severity:
high -
What to look for: Check for
public/robots.txt(static) or an API route/config that generates robots.txt (e.g.,app/robots.tsin Next.js App Router). Verify it contains at least aUser-agentdirective. -
Pass criteria: A robots.txt file exists (static or generated) with at least 1
User-agentdirective and at least 1Allowor non-blanketDisallowrule. The file must not block all crawlers from all content (i.e., must not consist solely ofDisallow: /). Count the number of directives in the file. -
Fail criteria: No robots.txt file exists, or the robots.txt blocks all crawlers from the entire site (
User-agent: * / Disallow: /). -
Skip (N/A) when: Never — every public web project should have a robots.txt.
-
Detail on fail:
"No robots.txt found in public/ directory or as a generated route"or"robots.txt blocks all crawlers with 'Disallow: /'" -
Cross-reference: For deeper crawlability, indexation, and robots.txt analysis, the Advanced SEO audit covers this in detail.
-
Remediation: robots.txt tells search engine crawlers which parts of your site to index. Create a
public/robots.txt:User-agent: * Allow: / Sitemap: https://yoursite.com/sitemap.xmlIn Next.js App Router, you can generate it dynamically:
// app/robots.ts export default function robots() { return { rules: { userAgent: '*', allow: '/' }, sitemap: 'https://yoursite.com/sitemap.xml', } }
External references
- external · RFC-9309 — Robots Exclusion Protocol
Taxons
History
- 2026-04-18·v1.0.0·Initial import from seo-fundamentals·automated