All 22 checks with why-it-matters prose, severity, and cross-references to related audits.
Blocking AI crawlers in robots.txt cuts your content out of every AI-powered search engine, citation engine, and LLM knowledge base that respects the robots-txt-spec. When GPTBot, ClaudeBot, or Google-Extended can't crawl your pages, your content doesn't appear in ChatGPT answers, Perplexity citations, or AI Overviews — regardless of how good that content is. For SaaS and content businesses, this directly suppresses top-of-funnel discovery from the fastest-growing traffic source. Even blocking one of the 10 major AI crawler user agents creates a measurable gap in your AI findability coverage.
Why this severity: Critical because even a single blocked AI crawler user agent fully excludes your content from that platform's citation and knowledge retrieval pipeline.
geo-readiness.ai-crawler-access.ai-crawlers-not-blockedSee full patternThe `noai` and `noimageai` meta robots directives instruct large language models and image-generation systems to exclude your pages from their training corpora and retrieval indexes. Shipping these directives on public marketing pages silently removes your product from ChatGPT, Gemini, Perplexity, and Copilot answers — competitors without the tag get cited in your place, and prospects asking AI assistants for recommendations in your category will never hear your name.
Why this severity: High because the directive fully excludes the page from AI citation surfaces with no partial-credit fallback and no visible warning.
geo-readiness.ai-crawler-access.no-ai-blocking-meta-tagsSee full patternAI crawlers from OpenAI, Anthropic, Google, and Perplexity overwhelmingly fetch raw HTML without executing JavaScript — a single-page app that renders content in `useEffect` presents them with an empty `<div id="root"></div>` and nothing to index or cite. The same empty shell also harms search rankings and first-paint metrics, compounding the visibility loss across both SEO and GEO surfaces simultaneously.
Why this severity: High because client-only rendering makes primary content completely invisible to non-JS-executing crawlers across every scraped page.
geo-readiness.ai-crawler-access.content-server-renderedSee full patternCloudflare Turnstile, reCAPTCHA, and user-agent-sniffing middleware that challenge every request will block OpenAI's GPTBot, Google-Extended, PerplexityBot, and ClaudeBot alongside malicious traffic. When your blog and docs return a JavaScript challenge instead of HTML, AI systems record the page as uncrawlable and stop retrying — your content vanishes from generative answers even though it loads fine for humans.
Why this severity: Medium because impact is conditional on how broadly the bot challenge is scoped across routes.
geo-readiness.ai-crawler-access.no-aggressive-bot-blockingSee full patternWhen an AI system is asked "what is Acme?" it extracts the answer from the first clear definitional sentence it can find on your homepage. A hero that reads "Reimagine Your Workflow" gives it nothing to quote, so the assistant either hallucinates a description, cites a competitor, or refuses to answer. Without a concrete name + category + function statement, your product is structurally uncitable regardless of how much downstream content exists.
Why this severity: Critical because a missing definitional statement makes the site uncitable at the root, blocking every downstream AI answer.
geo-readiness.content-citability.entity-self-definitionSee full patternGenerative engines cite by extracting 2-4 sentence paragraphs that stand alone as complete answers to user questions. A site built entirely from headings, bullet lists, and marketing phrases gives the model nothing extractable — even if every claim is true, none of it can be quoted. The result is a site that ranks for search but never appears in AI-generated summaries, briefings, or answer boxes.
Why this severity: Critical because without extractable paragraphs, no AI system can cite the site regardless of other content quality.
geo-readiness.content-citability.direct-answer-paragraphsSee full patternFAQ sections are among the most heavily weighted sources in retrieval-augmented generation pipelines — ChatGPT, Perplexity, and Gemini disproportionately pull Q-and-A content when answering user questions because the format maps directly onto their output shape. A site with no FAQ, or with fewer than three substantive questions, is invisible to the single highest-leverage GEO surface available.
Why this severity: High because FAQ content is the highest-leverage single GEO surface and its absence forfeits the largest citation opportunity.
geo-readiness.content-citability.faq-content-presentSee full patternAI systems citation-rank on specificity — a claim like "20-40 checks per audit, 62/100 average score on first run" is quotable and gets surfaced, while "powerful, industry-leading, seamless" is discarded as marketing noise. Sites dominated by unsubstantiated superlatives lose citation share to competitors whose pages contain named mechanisms, numeric outcomes, and concrete comparisons, even when the underlying product is weaker.
Why this severity: High because vague marketing claims are systematically filtered out of AI answers in favor of specific, evidence-backed ones.
geo-readiness.content-citability.claim-evidence-patternSee full patternLarge language models preferentially cite the source that defines a concept, not the sites that repeat it — if you name and explain a methodology, framework, or classification, every downstream mention triangulates back to your site as the authority. A site with no original concepts is structurally indistinguishable from hundreds of competitors in the model's latent space, so citations flow elsewhere.
Why this severity: Medium because original concepts compound authority over time but are not strictly required for baseline citation.
geo-readiness.content-citability.unique-conceptsSee full patternNumbers, percentages, counts, and dates are inherently citable — AI systems extract them verbatim and attribute them to your site, giving you durable presence in generative answers. Content that stays purely qualitative ("many features," "fast response," "comprehensive coverage") leaves the model nothing measurable to hand back to a user, so your page is skipped in favor of a competitor page with hard numbers.
Why this severity: Medium because specific data points materially increase citation rate but are complementary to definitional and FAQ content.
geo-readiness.content-citability.specific-data-pointsSee full patternWhen a user asks an AI assistant "how does X work?" the assistant reaches for explanatory process content — step-by-step mechanisms, workflows, and cause-and-effect descriptions. A site with no "how it works" narrative surrenders that entire query class to competitors, and the downstream effect is missed placements in comparison tables, recommendation lists, and tutorial-style answers where intent runs highest.
Why this severity: Low because explanatory content is complementary rather than foundational for AI citation.
geo-readiness.content-citability.how-it-works-explanationSee full patternGenerative engines infer topical authority from page count and angle diversity on a single subject — a site with three substantive pages covering different facets of its domain is weighted as expert, while a two-page site (home + pricing) is treated as a thin landing presence. Shallow sites get skipped in retrieval even when individual pages are strong, because the model lacks the signal that the operator has deep domain knowledge.
Why this severity: Critical because topical depth is a foundational authority signal and its absence caps citation rate across every other dimension.
geo-readiness.authority-expertise.topical-depthSee full patternAI systems — including ChatGPT, Claude, and Perplexity — weight source credibility when deciding whether to cite content. Anonymous content with no identified author, team, or organization scores lower on the E-E-A-T signals (Experience, Expertise, Authoritativeness, Trustworthiness) that both Google and LLMs use to rank citation-worthy sources. Missing Person or Organization schema (schema-org vocabulary) means structured authority data is absent from the machine-readable layer entirely. For AI-generated answers, attribution-free content is harder to trust and less likely to appear in citations — regardless of content quality.
Why this severity: High because absent authorship signals directly reduce AI citation rates; content without traceable authority is systematically deprioritized by LLMs evaluating source trustworthiness.
geo-readiness.authority-expertise.authorship-signalsSee full patternAI systems reward sites that demonstrate subject-matter expertise through depth — a single 800-word explanatory post or a 500-word documentation page signals authority far more than twenty 100-word feature blurbs. Shallow marketing content places you in the "generic vendor" bucket where citations go to the competitor who has written a methodology doc, even if that competitor ships a weaker product.
Why this severity: High because depth indicators are the primary expertise signal AI systems use to rank sources for citation.
geo-readiness.authority-expertise.expertise-depthSee full patternAI systems actively downrank undated content because they cannot assess freshness, and they strongly prefer citing sources with visible "published" or "last updated" timestamps. Blog posts and documentation without visible dates get passed over in favor of dated equivalents, even when the undated content is newer — the model has no way to know, so it plays safe by citing what it can verify.
Why this severity: Medium because date visibility affects freshness assessment but not baseline discoverability.
geo-readiness.authority-expertise.publication-dates-visibleSee full patternAI systems assess trustworthiness partly through operator transparency — a site with a dedicated `/about` page identifying who runs the product ranks higher on E-E-A-T signals that feed into citation decisions. A site with no about page looks anonymous, which generative engines treat as a trust risk and deprioritize when answering high-stakes queries about purchasing, compliance, or professional advice.
Why this severity: Low because operator identification contributes to trust signals but is not gating for most AI citations.
geo-readiness.authority-expertise.about-page-existsSee full patternAI models trace citations back to the origin of a claim, not the hundredth site to repeat it. Original benchmarks, case studies, methodology docs, and proprietary datasets mark your site as a source rather than a downstream aggregator, which measurably increases how often generative engines attribute facts to you rather than to a competitor who republished similar content.
Why this severity: Low because originality signals compound reputation slowly but are not required for baseline citation.
geo-readiness.authority-expertise.original-contentSee full patternAI crawlers use heading text as the primary signal of what a page section covers — headings like "Features," "Benefits," "Overview," and "Resources" convey zero topical information and provide no basis for the model to retrieve that section when answering a related query. Descriptive headings with product-specific nouns are what anchor a section in the model's topical map and get it cited.
Why this severity: High because generic headings strip topical context that AI systems rely on for section-level retrieval.
geo-readiness.ai-readable-structure.descriptive-headingsSee full patternAI crawlers (and human readers) make snap relevance decisions from the opening 150 words of a page — if the product name and a functional description do not appear there, the page is scored as low-signal and skipped. Leading with testimonial carousels, customer logos, or aspirational taglines forfeits the narrow attention window where citation decisions are made, regardless of what content appears later.
Why this severity: High because delayed key information causes AI systems to discard the page before reaching substantive content.
geo-readiness.ai-readable-structure.key-info-front-loadedSee full patternFAQ content is one of the highest-value citation targets for AI systems — LLMs actively extract Q&A pairs to answer user questions directly. Without FAQPage schema (schema-org FAQPage + Question types), your FAQ is visible to humans but invisible as structured data to AI indexers. This means AI-powered search features like Google AI Overviews and Perplexity's related questions bypass your answers in favor of competitor pages that have the markup. The gap is binary: schema present means structured indexing; schema absent means the AI system has to guess at content structure, and usually doesn't.
Why this severity: Medium because FAQ schema absence reduces AI citation probability for high-intent Q&A content, but doesn't break site functionality or expose a security risk.
geo-readiness.ai-readable-structure.faq-schema-markupSee full patternAI systems use `datePublished` and `dateModified` from Article and BlogPosting schema (schema-org vocabulary) to assess content freshness when deciding whether to cite it. Content without schema dates is treated as undated — LLMs answering time-sensitive questions will deprioritize undated articles in favor of sources that explicitly declare recency. For topics where currency matters (security advisories, pricing, API docs, tutorials), missing dates actively suppresses citation rates. This also affects Google AI Overviews, which surfaces fresher content preferentially when date signals are available.
Why this severity: Low because missing article dates reduce AI citation preference for time-sensitive content but don't block indexing or expose security or data-loss risk.
geo-readiness.ai-readable-structure.article-dates-in-schemaSee full patternAI systems parse structured, chunked content with far higher extraction fidelity than dense walls of text — a 500-word section under a single heading gets compressed into a vague summary, while three 150-word subsections under descriptive subheadings get quoted directly. Poorly chunked content reduces both citation rate and quote accuracy, feeding back into lower perceived authority for the site as a whole.
Why this severity: Info because content chunking affects extraction fidelity but rarely blocks citation outright.
geo-readiness.ai-readable-structure.content-well-chunkedSee full patternRun this audit in your AI coding tool (Claude Code, Cursor, Bolt, etc.) and submit results here for scoring and benchmarks.
Open GEO Readiness Audit