A missing robots.txt means crawlers operate without guidance: Googlebot and other search engines will crawl everything by default, including staging pages, admin routes, or duplicate parameter URLs that dilute crawl budget and can create duplicate-content penalties. A misconfigured Disallow: / blocks all crawlers from indexing the entire site, which causes complete de-indexing — the site disappears from search results within days. RFC 9309 formalizes the robots.txt protocol; non-compliance means crawler behavior is undefined across different search engines.
High because a missing robots.txt wastes crawl budget on non-canonical URLs, and a `Disallow: /` misconfiguration causes complete search engine de-indexing of the site.
Create public/robots.txt with at minimum an allow-all rule and a sitemap reference:
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
In Next.js App Router, generate it programmatically from app/robots.ts so the sitemap URL stays in sync with your environment:
// app/robots.ts
import type { MetadataRoute } from 'next'
export default function robots(): MetadataRoute.Robots {
return {
rules: { userAgent: '*', allow: '/', disallow: ['/admin/', '/api/'] },
sitemap: `${process.env.NEXT_PUBLIC_SITE_URL}/sitemap.xml`,
}
}
Verify the live output at https://yoursite.com/robots.txt after deploying. Never set Disallow: / in production.
ID: seo-fundamentals.discoverability.robots-txt
Severity: high
What to look for: Check for public/robots.txt (static) or an API route/config that generates robots.txt (e.g., app/robots.ts in Next.js App Router). Verify it contains at least a User-agent directive.
Pass criteria: A robots.txt file exists (static or generated) with at least 1 User-agent directive and at least 1 Allow or non-blanket Disallow rule. The file must not block all crawlers from all content (i.e., must not consist solely of Disallow: /). Count the number of directives in the file.
Fail criteria: No robots.txt file exists, or the robots.txt blocks all crawlers from the entire site (User-agent: * / Disallow: /).
Skip (N/A) when: Never — every public web project should have a robots.txt.
Detail on fail: "No robots.txt found in public/ directory or as a generated route" or "robots.txt blocks all crawlers with 'Disallow: /'"
Cross-reference: For deeper crawlability, indexation, and robots.txt analysis, the Advanced SEO audit covers this in detail.
Remediation: robots.txt tells search engine crawlers which parts of your site to index. Create a public/robots.txt:
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
In Next.js App Router, you can generate it dynamically:
// app/robots.ts
export default function robots() {
return {
rules: { userAgent: '*', allow: '/' },
sitemap: 'https://yoursite.com/sitemap.xml',
}
}