How to manually block common AI crawlers

Trey Griffin

August 20, 2025 14:19

It’s easy to block AI crawlers from companies like OpenAI, Facebook, Anthropic, and Google that scrape and train on your content without consent or compensation.

Help Raptive advocate for you. By refusing to allow AI companies to continue to crawl and scrape your content, you’re lending strength to our fight for content creators and giving us more data points to use in our advocacy efforts.
Make your position clear. Join other creators and publishers like The New York Times in drawing a clear line in the sand for AI companies, dictating how they may or may not access your content library.

This guide provides step-by-step instructions on how to manually edit your robots.txt file to protect your content.

Using the Raptive Ads plugin? It’s even easier—check out our guide on how to block common AI crawlers with our WordPress plugin.

Why block AI crawlers?

Consent and Control: Blocking ensures your content isn’t used without your explicit permission.
Compensation: Prevents unauthorized use of your content, helping you negotiate fair licensing or compensation.
Copyright Protection: Protects your intellectual property from potential copyright violations.
Traffic and Revenue Preservation: Mitigates risks of traffic loss and revenue decline due to AI products using your content.
Fair Competition: Stops your content from being exploited to create low-quality outputs that compete with your original work.
Advocating Responsible AI: Sends a clear message supporting ethical and fair AI practices.

You can watch our webinar on Understanding AI’s Impact on the Creator Economy' here.

Does blocking AI crawlers affect Google Search?

According to Google, blocking Google-Extended does not affect your Google search rankings.

Google-Extended is a standalone product token that web publishers can use to manage whether their sites help improve Gemini Apps and Vertex AI generative APIs, including future generations of models that power those products. Grounding with Google Search on Vertex AI does not use web pages for grounding that have disallowed Google-Extended. Google-Extended does not impact a site's inclusion or ranking in Google Search.

How to block AI crawlers

You can block AI crawlers by adding them to your site’s robots.txt file as disallowed user agents (according to each AI company’s instructions.)

User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: CCbot
Disallow: /
User-agent: FacebookBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: PiplBot
Disallow: /

After you’ve saved the changes to your robots.txt file, view the updated file by clearing any caches and adding /robots.txt to the end of your site's URL in a browser window. You can also use Google’s robots.txt checker for extra verification.

If you don’t feel comfortable editing your site’s robots.txt file or aren’t sure how to do this, reach out to your web developer for a helping hand.

What are these user agents?

anthropic-ai: The user agent for AI company Anthropic.
Claude-Web: Anthropic’s general-purpose web crawler.
CCbot: The user agent for Common Crawl, which maintains an open repository of web crawl data.
FacebookBot: Meta’s user agent that crawls public web pages to improve language models for Facebook’s speech recognition technology.
GPTBot: The user agent for OpenAI’s web crawler which crawls web pages to potentially use to improve future models.
- Note: GPTBot also includes ChatGPT-User, which is necessary for a site to appear in OpenAI’s web-enabled searches, but the traffic from this source is currently so minimal that we still recommend blocking this user agent.
Google-Extended: the user agent Google uses to access content to train Gemini and other AI products.
- Google documentation and direct confirmation from the Google Search team say that the Google-Extended user agent doesn’t affect your site’s search rankings or inclusion in AI Overviews (formerly SGE.)
- We did our own research, too. We analyzed search traffic for sites that have already blocked Google-Extended and found no negative impact.
PiplBot: A user agent that collects documents from the Web to build a searchable index.

What does it mean to "block" AI crawlers?

Adding an entry to your site’s robots.txt file tells these crawlers not to crawl your site going forward. It doesn’t prevent them from accessing your site, but it’s the way each company has documented for a site to “opt out” and the industry-standard method for declaring which crawlers you permit to access your site.

Blocking AI crawlers - Robots.txt entries.png

Many large publishers like The New York Times, Wall Street Journal, Vox, and Reuters have already blocked most or all of these AI crawlers using this method.

I'm using the Raptive Ads WordPress plugin—why do I need to manually update my robots.txt file?

The Raptive Ads WordPress plugin is able to automatically update robots.txt entries for sites with a virtual robots.txt file, which is generated when requested. If your site has a physical robots.txt file, you’ll need to add the entries manually.

How to add robots.txt entries with Yoast

Many sites use the Yoast SEO plugin to update robots.txt entries. Just copy and paste the entries into Yoast’s robots.txt editor.

Yoast Robots.txt (1).png

How to add robots.txt entries using the File Manager plugin

The File Manager plugin is another easy way to update your robots.txt file.

Step 1: Install and activate the WP File Manager plugin from the WordPress plugin repository.

WP Plugin File Manager.png

Step 2: Navigate to the WP File Manager plugin in your WordPress dashboard, right-click on the robots.txt file name, and select "Code Editor."

WP File Manager Robots.txt.png

WP File Manager Code Editor.png

Step 3: Add the necessary AI crawler lines to the file and save your changes.

WP File Manager robots.txt editor.png

[Optional] Step 4: If you no longer need the File Manager plugin, you can deactivate and delete it from your website.