AI Crawler User-Agents

A maintained, machine-readable reference of the bots that crawl the web for AI engines — what each one is for (training vs. search/citation vs. user-action), its robots.txt token, and the official docs. Built so you can make an informed choice about who reads your site, not a blanket allow/block.

Why this exists

"Should I allow GPTBot?" is the wrong question on its own. Different bots do different jobs:

search / citation bots (e.g. OAI-SearchBot, PerplexityBot) are how you get cited in AI answers — usually you want these.
training bots (e.g. GPTBot, Google-Extended, CCBot) feed model corpora — a separate decision, and a legitimate one to decline.
user-action bots (e.g. ChatGPT-User, Claude-User) fetch a page because a person asked — blocking these can break features users requested.

This dataset makes that distinction explicit and keeps the user-agent tokens in one place that other tools can import.

Use it

As data — import crawlers.json:

import crawlers from "./crawlers.json" with { type: "json" };
const citationBots = crawlers.filter(c => c.purpose.includes("search"));

Generate a robots.txt from the dataset:

node build-robots.mjs                              # allow every listed AI crawler
node build-robots.mjs --purpose search,user-action # allow citation bots, leave training out

Pipe it where you need it: node build-robots.mjs --purpose search,user-action > robots-ai.txt.

Pairs with ai-crawler-allowlist, which covers the edge/CDN layer (e.g. Cloudflare's HTTP 402 trap) where most AI bots are actually blocked.

Schema

Each entry in crawlers.json:

Field	Meaning
`name`	Common bot name
`operator`	Company that runs it
`user_agent_token`	The token to match in `robots.txt`
`purpose`	One or more of `training`, `search`, `user-action`
`respects_robots_txt`	Whether the operator states it honours `robots.txt`
`docs`	Official documentation, where published
`notes`	Caveats worth knowing

Accuracy

Crawler behaviour and tokens change. This file is maintained, but verify against the operator's official docs before relying on it — and please open a PR with evidence if you spot drift (see CONTRIBUTING.md).

Built by Flow.OS

Maintained by Flow.OS — AI visibility & first-party attribution for premium brands across Australia & APAC. Free AI Visibility Score at https://flowos.digital

License

CC BY 4.0 — reuse the data with attribution to Flow.OS (flowos.digital).

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
build-robots.mjs		build-robots.mjs
crawlers.json		crawlers.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Crawler User-Agents

Why this exists

Use it

Schema

Accuracy

Built by Flow.OS

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

AI Crawler User-Agents

Why this exists

Use it

Schema

Accuracy

Built by Flow.OS

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages