You can manually block AI crawlers on your site by adding rules to your robots.txt file.
This guide provides step-by-step instructions for manually editing your robots.txt file, including copy-and-paste examples.
Using the Raptive Ads plugin? It’s even easier—check out our guide on how to block common AI crawlers with our WordPress plugin.
|
Should I block or allow AI crawlers on my site? It depends on your goals. Read our latest article, which explains which bots to block, which to allow, and how those choices can impact your traffic and revenue. It also breaks down how these decisions fit into the bigger picture of AI, search, and future compensation for creators. Read: How blocking AI bots paves the way for fair compensation |
How to block AI crawlers
1 - Locate your robots.txt file
Your robots.txt is typically found at yoursite.com/robots.txt
If your site doesn't have one, you can create a new robots.txt file in your site's root directory.
2 - Add crawler rules
Copy and paste these AI Crawler rules for the bots you'd like to block.
User-agent: Amazonbot
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Applebot-Extended
Disallow: /
User-agent: Bytespider
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: Diffbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: ImagesiftBot
Disallow: /
User-agent: omgili
Disallow: /
User-agent: omgilibot
Disallow: /
User-agent: PiplBot
Disallow: /
3 - Save and verify
After you’ve saved the changes to your robots.txt file, view the updated file by clearing any caches and adding /robots.txt to the end of your site's URL in a browser window. You can also use Google’s robots.txt checker for extra verification.
If you don’t feel comfortable editing your site’s robots.txt file or aren’t sure how to do this, reach out to your web developer for a helping hand.
What are these user agents?
- Amazonbot: Amazon’s web crawler used to collect data for improving Alexa, search, and other AI-driven services
- anthropic-ai: The user agent for AI company Anthropic
- Applebot-Extended: Apple’s crawler used for AI models and features beyond standard Applebot search indexing
- Bytespider: ByteDance’s web crawler (used by TikTok and related services) to collect data for search, recommendation, and AI model training
- CCBot: The user agent for Common Crawl, which maintains an open repository of web crawl data
- Claude-Web: Anthropic’s general-purpose web crawler
- ClaudeBot: Anthropic’s general-purpose web crawler
- Cohere-ai: Cohere’s crawler used to train and enhance language models
- Diffbot: A crawler that extracts structured data from web pages to power knowledge graphs and AI applications
- FacebookBot: Meta’s user agent that crawls public web pages to improve language models for speech recognition technology
- GPTBot: The user agent for OpenAI’s web crawler, which crawls web pages to potentially use to improve future models
- ImagesiftBot: Used for image recognition and moderation, helping train systems that classify and filter visual content
- omgili: Omgili’s crawler used to index online discussions (forums, message boards) for search and data analysis
- omgilibot: Omgili’s companion crawler that gathers forum and discussion-based content for indexing and analytics
- PiplBot: A user agent that collects documents from the web to build a searchable index
What does it mean to "block" AI crawlers?
Adding an entry to your site’s robots.txt file tells these crawlers not to crawl your site going forward. It doesn’t prevent them from accessing your site, but it’s the way each company has documented for a site to “opt out” and the industry-standard method for declaring which crawlers you permit to access your site.
Many large publishers like The New York Times, Wall Street Journal, Vox, and Reuters have already blocked most or all of these AI crawlers using this method.
Can AI crawlers ignore my block request?
Some badly-behaved crawlers may ignore your block request. That’s where you’ll need support from companies that can help enforce your request:
- Content Delivery Network (CDN): Service that helps quickly and reliably deliver your website’s content, like Cloudflare, Fastly, and Akamai. Many CDNs also offer security features including bot detection and blocking.
- Cybersecurity firms: Companies specializing in bot management, like Human Security and DataDome, offer more sophisticated tools to identify and block unwanted bot traffic, including AI scrapers trying to disguise themselves as legitimate visitors.
I'm using the Raptive Ads WordPress plugin—why do I need to manually update my robots.txt file?
The Raptive Ads WordPress plugin can automatically update robots.txt entries for sites with a virtual robots.txt file, which is generated upon request. If your site has a physical robots.txt file, you’ll need to add the entries manually.
How to add robots.txt entries with Yoast
Many sites use the Yoast SEO plugin to update robots.txt entries. Just copy and paste the entries into Yoast’s robots.txt editor.
How to add robots.txt entries using the File Manager plugin
The File Manager plugin is another easy way to update your robots.txt file.
Step 1: Install and activate the WP File Manager plugin from the WordPress plugin repository.
Step 2: Navigate to the WP File Manager plugin in your WordPress dashboard, right-click on the robots.txt file name, and select "Code Editor."
Step 3: Add the necessary AI crawler lines to the file and save your changes.
*Note: the list pictured below is not comprehensive. Please reference the list above for the latest recommended crawlers to disallow.
[Optional] Step 4: If you no longer need the File Manager plugin, you can deactivate and delete it from your website.