How to manually block common AI crawlers

It’s easy to block AI crawlers from companies like OpenAI, Facebook, Anthropic, and Google that scrape and train on your content without consent or compensation.

  • Help Raptive advocate for you. By refusing to allow AI companies to continue to crawl and scrape your content, you’re lending strength to our fight for content creators and giving us more data points to use in our advocacy efforts.
  • Make your position clear. Join other creators and publishers like The New York Times in drawing a clear line in the sand for AI companies, dictating how they may or may not access your content library.

This guide provides step-by-step instructions on how to manually edit your robots.txt file to protect your content. Using the Raptive Ads plugin? It’s even easier—check out our guide on how to block common AI crawlers with our WordPress plugin.

How it works

You can block AI crawlers by adding them to your site’s robots.txt file as disallowed user agents (according to each AI company’s instructions.)

User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: CCbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: PiplBot
Disallow: /

 

After you’ve saved the changes to your robots.txt file, view the updated file by clearing any caches and adding /robots.txt to the end of your site's URL in a browser window. You can also use Google’s robots.txt checker for extra verification.

If you don’t feel comfortable editing your site’s robots.txt file or aren’t sure how to do this, reach out to your web developer for a helping hand.

What are these user agents?

  • anthropic-ai: The user agent for AI company Anthropic.
  • Claude-Web: Anthropic’s general-purpose web crawler.
  • CCbot: The user agent for Common Crawl, which maintains an open repository of web crawl data.
  • FacebookBot: Meta’s user agent that crawls public web pages to improve language models for Facebook’s speech recognition technology.
  • GPTBot: The user agent for OpenAI’s web crawler which crawls web pages to potentially use to improve future models.
    • Note: GPTBot also includes ChatGPT-User, which is necessary for a site to appear in OpenAI’s web-enabled searches, but the traffic from this source is currently so minimal that we still recommend blocking this user agent.
  • Google-Extended: the user agent Google uses to access content to train Gemini and other AI products.
    • Google documentation and direct confirmation from the Google Search team say that the Google-Extended user agent doesn’t affect your site’s search rankings or inclusion in AI Overviews (formerly SGE.)
    • We did our own research, too. We analyzed search traffic for sites that have already blocked Google-Extended and found no negative impact.
  • PiplBot: A user agent that collects documents from the Web to build a searchable index.

What does it mean to "block" AI crawlers?

Adding an entry to your site’s robots.txt file tells these crawlers not to crawl your site going forward. It doesn’t prevent them from accessing your site, but it’s the way each company has documented for a site to “opt out” and the industry-standard method for declaring which crawlers you permit to access your site.

Blocking AI crawlers - Robots.txt entries.png

Many large publishers like The New York Times, Wall Street Journal, Vox, and Reuters have already blocked most or all of these AI crawlers using this method.

I'm using the Raptive Ads WordPress plugin—why do I need to manually update my robots.txt file?

The Raptive Ads WordPress plugin is able to automatically update robots.txt entries for sites with a virtual robots.txt file, which is generated when requested. If your site has a physical robots.txt file, you’ll need to add the entries manually.

How to add robots.txt entries with Yoast

Many sites use the Yoast SEO plugin to update robots.txt entries. Just copy and paste the entries into Yoast’s robots.txt editor.

Yoast Robots.txt (1).png

How to add robots.txt entries using the File Manager plugin

The File Manager plugin is another easy way to update your robots.txt file.

Step 1: Install and activate the WP File Manager plugin from the WordPress plugin repository.

WP Plugin File Manager.png

Step 2: Navigate to the WP File Manager plugin in your WordPress dashboard, right-click on the robots.txt file name, and select "Code Editor."

WP File Manager Robots.txt.png

WP File Manager Code Editor.png

Step 3: Add the necessary AI crawler lines to the file and save your changes.

WP File Manager robots.txt editor.png

[Optional] Step 4: If you no longer need the File Manager plugin, you can deactivate and delete it from your website.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Send a message

Want to join Raptive? Apply here!