How to block common AI crawlers with the Raptive Ads WordPress plugin

The Raptive Ads WordPress plugin (version 3.6.0 and above) makes it easy to block AI crawlers from companies like OpenAI, Facebook, Anthropic, and Google that scrape and train on your content without consent or compensation.

  • Help Raptive advocate for you. By refusing to allow AI companies to continue to crawl and scrape your content, you’re lending strength to our fight for content creators and giving us more data points to use in our advocacy efforts.
  • Make your position clear. Join other creators and publishers like The New York Times in drawing a clear line in the sand for AI companies, dictating how they may or may not access your content library.

This guide provides step-by-step instructions on how to enable these settings to protect your content. Not using WordPress or the Raptive Ads plugin? Check out our guide on how to manually block common AI crawlers.

How it works

In the Raptive Ads plugin settings, you’ll find a new section to help you block AI crawlers by adding them to your site’s robots.txt file as disallowed user agents (according to each AI company’s instructions) which goes into effect on June 3, 2024.

Block AI Crawlers plugin settings.png

Most of these settings are turned on by default. You can control which AI crawlers are blocked at any time through the WordPress plugin settings.

Blocking automatically enabled:

  • anthropic-ai
  • CCBot
  • Claude-Web
  • FacebookBot
  • GPTBot
  • PiplBot

Blocking not automatically enabled:

  • Google-Extended

 

An important note about Google-Extended

Google-Extended is the user agent Google uses to access content to train Gemini and other AI products. Google-Extended doesn’t have anything to do with Google Search, including AI Overviews.

Don't just take it from us–here it is in Google's documentation:

Google-Extended documentation border.png

We’ve left this option in your hands, but we strongly recommend checking this box to block the Google-Extended user agent.

  • Google documentation and direct confirmation from the Google Search team say that the Google-Extended user agent doesn’t affect your site’s search rankings or inclusion in AI Overviews (formerly SGE.)
  • We did our own research, too. We analyzed search traffic for sites that have already blocked Google-Extended and found no negative impact.

Visit your Raptive Ads plugin settings to block Google-Extended.

Block AI Crawlers plugin settings .png

What are the other user agents?

  • anthropic-ai: The user agent for AI company Anthropic.
  • Claude-Web: Anthropic’s general-purpose web crawler.
  • CCbot: The user agent for Common Crawl, which maintains an open repository of web crawl data.
  • FacebookBot: Meta’s user agent that crawls public web pages to improve language models for Facebook’s speech recognition technology.
  • GPTBot: The user agent for OpenAI’s web crawler which crawls web pages to potentially use to improve future models.
    • Note: GPTBot also includes ChatGPT-User, which is necessary for a site to appear in OpenAI’s web-enabled searches, but the traffic from this source is currently so minimal that we still recommend blocking this user agent.
  • PiplBot: A user agent that collects documents from the Web to build a searchable index.

What does it mean to "block" AI crawlers?

Technically, we're helping you add an entry to your site’s robots.txt file that tells these crawlers not to crawl your site going forward. It doesn’t prevent them from accessing your site, but it’s the way each company has documented for a site to “opt out” and the industry-standard method for declaring which crawlers you permit to access your site.

Blocking AI crawlers - Robots.txt entries.png

Many large publishers like The New York Times, Wall Street Journal, Vox, and Reuters have already blocked most or all of these AI crawlers using this method.

What if I don't see these options in my plugin settings?

  • Make sure you’re running the latest version of the Raptive Ads WordPress plugin.
  • Does your site have a physical robots.txt file? If so, our plugin won’t be able to update it for you. Follow these instructions to add new entries to your robots.txt file to block AI crawlers.
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Send a message

Want to join Raptive? Apply here!