robots.txt and AI crawlers: what to allow
AI engines use their own crawlers. Here is what the major ones are, how to control them in robots.txt, and the trade-offs of blocking them.
AI engines crawl the web with their own user agents, separate from the classic search bots. Your robots.txt decides which of them can read your content, and that decision now affects whether you appear in AI answers.
Know the crawlers
- GPTBot: used by OpenAI to gather training and retrieval data.
- OAI-SearchBot: powers ChatGPT search results.
- PerplexityBot: crawls for Perplexity answers.
- Google-Extended: controls use of your content in Google AI features.
- ClaudeBot and others: each AI provider ships its own agent.
How to control them
robots.txt rules are per user agent. You can allow classic search while choosing how AI crawlers may access your content. A blanket disallow can quietly remove you from AI answers.
The trade-off
Blocking AI crawlers protects content from being used without a click, but it also makes you invisible in the AI answers more people now rely on. For most publishers and businesses, visibility in those answers is worth more than the protection.
- Decide which AI engines you want to appear in.
- Allow their crawlers explicitly in robots.txt.
- Block only the agents whose use you genuinely object to.
- Re-check after changes, since a typo can block everything.
If you block the crawler, you opt out of the answer. Choose deliberately.
Use the SEO Pine robots.txt generator to build clean, per-agent rules, and the AI Visibility Scanner to confirm AI crawlers can reach you.