How to block common AI crawlers with the Raptive Ads WordPress plugin

The Raptive Ads WordPress plugin (version 3.6.0 and above) makes it easy to block AI crawlers from companies like OpenAI, Facebook, Anthropic, and Google that scrape and train on your content without consent or compensation.

  • Help Raptive advocate for you. By refusing to allow AI companies to continue to crawl and scrape your content, you’re lending strength to our fight for content creators and giving us more data points to use in our advocacy efforts.
  • Make your position clear. Join other creators and publishers like The New York Times in drawing a clear line in the sand for AI companies, dictating how they may or may not access your content library.

This guide provides step-by-step instructions on how to enable these settings to protect your content.

Not using WordPress or the Raptive Ads plugin? Check out our guide on how to manually block common AI crawlers.

Why block AI crawlers?

  1. Consent and Control: Blocking ensures your content isn’t used without your explicit permission.
  2. Compensation: Prevents unauthorized use of your content, helping you negotiate fair licensing or compensation.
  3. Copyright Protection: Protects your intellectual property from potential copyright violations.
  4. Traffic and Revenue Preservation: Mitigates risks of traffic loss and revenue decline due to AI products using your content.
  5. Fair Competition: Stops your content from being exploited to create low-quality outputs that compete with your original work.
  6. Advocating Responsible AI: Sends a clear message supporting ethical and fair AI practices.

You can watch our webinar on Understanding AI’s Impact on the Creator Economy' here.

Does blocking AI crawlers affect Google Search?

According to Google, blocking Google-Extended does not affect your Google search rankings.

Google-Extended is a standalone product token that web publishers can use to manage whether their sites help improve Gemini Apps and Vertex AI generative APIs, including future generations of models that power those products. Grounding with Google Search on Vertex AI does not use web pages for grounding that have disallowed Google-Extended. Google-Extended does not impact a site's inclusion or ranking in Google Search.

How to block AI crawlers

In the Raptive Ads plugin settings, you’ll find a new section to help you block AI crawlers by adding them to your site’s robots.txt file as disallowed user agents (according to each AI company’s instructions) which goes into effect on June 3, 2024.

Block AI Crawlers plugin settings.png

Most of these settings are turned on by default. You can control which AI crawlers are blocked at any time through the WordPress plugin settings.

Blocking automatically enabled:

  • anthropic-ai
  • CCBot
  • Claude-Web
  • FacebookBot
  • GPTBot
  • PiplBot

Blocking not automatically enabled:

  • Google-Extended

 

An important note about Google-Extended

Google-Extended is the user agent Google uses to access content to train Gemini and other AI products. Google-Extended doesn’t have anything to do with Google Search, including AI Overviews.

Don't just take it from us–here it is in Google's documentation:

Google-Extended documentation border.png

We’ve left this option in your hands, but we strongly recommend checking this box to block the Google-Extended user agent.

  • Google documentation and direct confirmation from the Google Search team say that the Google-Extended user agent doesn’t affect your site’s search rankings or inclusion in AI Overviews (formerly SGE.)
  • We did our own research, too. We analyzed search traffic for sites that have already blocked Google-Extended and found no negative impact.

Visit your Raptive Ads plugin settings to block Google-Extended.

Block AI Crawlers plugin settings .png

What are the other user agents?

  • anthropic-ai: The user agent for AI company Anthropic.
  • Claude-Web: Anthropic’s general-purpose web crawler.
  • CCbot: The user agent for Common Crawl, which maintains an open repository of web crawl data.
  • FacebookBot: Meta’s user agent that crawls public web pages to improve language models for Facebook’s speech recognition technology.
  • GPTBot: The user agent for OpenAI’s web crawler which crawls web pages to potentially use to improve future models.
    • Note: GPTBot also includes ChatGPT-User, which is necessary for a site to appear in OpenAI’s web-enabled searches, but the traffic from this source is currently so minimal that we still recommend blocking this user agent.
  • PiplBot: A user agent that collects documents from the Web to build a searchable index.

What does it mean to "block" AI crawlers?

Technically, we're helping you add an entry to your site’s robots.txt file that tells these crawlers not to crawl your site going forward. It doesn’t prevent them from accessing your site, but it’s the way each company has documented for a site to “opt out” and the industry-standard method for declaring which crawlers you permit to access your site.

Blocking AI crawlers - Robots.txt entries.png

Many large publishers like The New York Times, Wall Street Journal, Vox, and Reuters have already blocked most or all of these AI crawlers using this method.

What if I don't see these options in my plugin settings?

  • Make sure you’re running the latest version of the Raptive Ads WordPress plugin.
  • Does your site have a physical robots.txt file? If so, our plugin won’t be able to update it for you. Follow these instructions to add new entries to your robots.txt file to block AI crawlers.
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Send a message

Want to join Raptive? Apply here!