The Raptive Ads WordPress plugin (version 3.6.0 and above) makes it easy to block AI crawlers from companies like OpenAI, Facebook, Anthropic, and Google that scrape and train on your content without consent or compensation.
-
Help Raptive advocate for you. By refusing to allow AI companies to continue to crawl and scrape your content, you’re lending strength to our fight for content creators and giving us more data points to use in our advocacy efforts.
- Make your position clear. Join other creators and publishers like The New York Times in drawing a clear line in the sand for AI companies, dictating how they may or may not access your content library.
This guide provides step-by-step instructions on how to enable these settings to protect your content. Not using WordPress or the Raptive Ads plugin? Check out our guide on how to manually block common AI crawlers.
How it works
In the Raptive Ads plugin settings, you’ll find a new section to help you block AI crawlers by adding them to your site’s robots.txt file as disallowed user agents (according to each AI company’s instructions) which goes into effect on June 3, 2024.
Most of these settings are turned on by default. You can control which AI crawlers are blocked at any time through the WordPress plugin settings.
Blocking automatically enabled:
|
Blocking not automatically enabled:
|
An important note about Google-Extended
Google-Extended is the user agent Google uses to access content to train Gemini and other AI products. Google-Extended doesn’t have anything to do with Google Search, including AI Overviews.
Don't just take it from us–here it is in Google's documentation:
We’ve left this option in your hands, but we strongly recommend checking this box to block the Google-Extended user agent.
- Google documentation and direct confirmation from the Google Search team say that the Google-Extended user agent doesn’t affect your site’s search rankings or inclusion in AI Overviews (formerly SGE.)
- We did our own research, too. We analyzed search traffic for sites that have already blocked Google-Extended and found no negative impact.
Visit your Raptive Ads plugin settings to block Google-Extended.
What are the other user agents?
- anthropic-ai: The user agent for AI company Anthropic.
- Claude-Web: Anthropic’s general-purpose web crawler.
- CCbot: The user agent for Common Crawl, which maintains an open repository of web crawl data.
- FacebookBot: Meta’s user agent that crawls public web pages to improve language models for Facebook’s speech recognition technology.
-
GPTBot: The user agent for OpenAI’s web crawler which crawls web pages to potentially use to improve future models.
- Note: GPTBot also includes ChatGPT-User, which is necessary for a site to appear in OpenAI’s web-enabled searches, but the traffic from this source is currently so minimal that we still recommend blocking this user agent.
- PiplBot: A user agent that collects documents from the Web to build a searchable index.
What does it mean to "block" AI crawlers?
Technically, we're helping you add an entry to your site’s robots.txt file that tells these crawlers not to crawl your site going forward. It doesn’t prevent them from accessing your site, but it’s the way each company has documented for a site to “opt out” and the industry-standard method for declaring which crawlers you permit to access your site.
Many large publishers like The New York Times, Wall Street Journal, Vox, and Reuters have already blocked most or all of these AI crawlers using this method.
What if I don't see these options in my plugin settings?
- Make sure you’re running the latest version of the Raptive Ads WordPress plugin.
- Does your site have a physical robots.txt file? If so, our plugin won’t be able to update it for you. Follow these instructions to add new entries to your robots.txt file to block AI crawlers.