A perfect on-page AEO build is invisible if AI crawlers can't reach your origin. This repo is a copy-paste
robots.txt, a Cloudflare edge checklist, and a one-command verifier so ChatGPT, Perplexity, Gemini, Copilot and Claude can actually read — and cite — your site.
Two silent failure modes keep well-optimised sites out of AI answers:
robots.txtomission. AI crawlers use their own user-agents (OAI-SearchBot,PerplexityBot,ClaudeBot, …). If your rules don't address them, edge cases and over-broadDisallowblocks can shut them out.- The edge layer. CDNs increasingly block AI crawlers by default — Cloudflare's
managed "Block AI Scrapers and Crawlers" rule returns HTTP 403, and Pay-Per-Crawl
returns 402, to AI bots on many onboardings. Your
robots.txtcan say Allow while the edge silently turns the crawler away before it ever reaches origin.
This repo addresses both.
| File | Purpose |
|---|---|
robots.txt |
Annotated allowlist for the AI search/citation crawlers (drop in, edit the Sitemap: line) |
cloudflare/SETUP.md |
Allow AI crawlers at the Cloudflare edge; disable the 402 trap |
verify.sh |
Curl your site as each AI bot and print the status code (expect 200) |
-
Copy
robots.txtto your site root, update theSitemap:URL. -
Work through
cloudflare/SETUP.md(or your CDN's equivalent). -
Confirm:
./verify.sh https://yourdomain.com
Every row should read
200. A402means Pay-Per-Crawl is blocking;403means a WAF/bot rule is.Read the result correctly.
verify.shsends each bot's user-agent from your machine, so it tests user-agent-level blocking only. Real crawlers are also validated by IP/ASN — a403is a strong signal to investigate, but confirm in your CDN's bot analytics or server logs that verified bots get200.
The baseline robots.txt allows the bots that get you cited. Some teams want
citation visibility but opt out of model-training corpora. To do that, keep the
search/citation bots (OAI-SearchBot, ChatGPT-User, PerplexityBot, ClaudeBot)
allowed, and set the training-oriented agents (GPTBot, Google-Extended,
Applebot-Extended, CCBot) to Disallow: /. Comments in robots.txt mark which is
which. Note Google-Extended governs Gemini/Vertex training only — it does not
affect Googlebot or Google Search indexing.
Maintained by Flow.OS — AI visibility & first-party attribution for premium brands across Australia & APAC.
Free: check whether AI engines can see your brand → AI Visibility Score at https://flowos.digital
MIT.