A maintained, machine-readable reference of the bots that crawl the web for AI engines — what each one is for (training vs. search/citation vs. user-action), its
robots.txttoken, and the official docs. Built so you can make an informed choice about who reads your site, not a blanket allow/block.
"Should I allow GPTBot?" is the wrong question on its own. Different bots do different jobs:
- search / citation bots (e.g.
OAI-SearchBot,PerplexityBot) are how you get cited in AI answers — usually you want these. - training bots (e.g.
GPTBot,Google-Extended,CCBot) feed model corpora — a separate decision, and a legitimate one to decline. - user-action bots (e.g.
ChatGPT-User,Claude-User) fetch a page because a person asked — blocking these can break features users requested.
This dataset makes that distinction explicit and keeps the user-agent tokens in one place that other tools can import.
As data — import crawlers.json:
import crawlers from "./crawlers.json" with { type: "json" };
const citationBots = crawlers.filter(c => c.purpose.includes("search"));Generate a robots.txt from the dataset:
node build-robots.mjs # allow every listed AI crawler
node build-robots.mjs --purpose search,user-action # allow citation bots, leave training outPipe it where you need it: node build-robots.mjs --purpose search,user-action > robots-ai.txt.
Pairs with
ai-crawler-allowlist, which covers the edge/CDN layer (e.g. Cloudflare's HTTP 402 trap) where most AI bots are actually blocked.
Each entry in crawlers.json:
| Field | Meaning |
|---|---|
name |
Common bot name |
operator |
Company that runs it |
user_agent_token |
The token to match in robots.txt |
purpose |
One or more of training, search, user-action |
respects_robots_txt |
Whether the operator states it honours robots.txt |
docs |
Official documentation, where published |
notes |
Caveats worth knowing |
Crawler behaviour and tokens change. This file is maintained, but verify against the
operator's official docs before relying on it — and please open a PR with evidence if you
spot drift (see CONTRIBUTING.md).
Maintained by Flow.OS — AI visibility & first-party attribution for premium brands across Australia & APAC. Free AI Visibility Score at https://flowos.digital
CC BY 4.0 — reuse the data with attribution to Flow.OS (flowos.digital).