Anthropic’s crawler is ignoring websites’ anti-AI scraping policies

Estimated read time 2 min read


The ClaudeBot web crawler that Anthropic uses to scrape training data for AI models like Claude has hammered iFixit’s website almost a million times in a 24-hour period, seemingly violating the repair company’s Terms of Use in the process. 

“If any of those requests accessed our terms of service, they would have told you that use of our content expressly forbidden. But don’t ask me, ask Claude!” said iFixit CEO Kyle Wiens on X, posting images that show Anthropic’s chatbot acknowledging that iFixit’s content was off limits. “You’re not only taking our content without paying, you’re tying up our devops resources. If you want to have a conversation about licensing our content for commercial use, we’re right here.”

iFixit’s Terms of Use policy states that “reproducing, copying or distributing” any content from the website is “strictly prohibited without the express prior written permission” from the company, with specific inclusion of “training a machine learning or AI model.” When Anthropic was questioned on this by 404 Media, however, the AI company linked back to an FAQ page that says its crawler can only be blocked via a robots.txt file extension.

Wiens says iFixit has since added the crawl-delay extension to its robots.txt. We have asked Wiens and Anthropic for comment and will update this story if we hear back.

iFixit doesn’t seem to be alone, with Read the Docs co-founder Eric Holscher and Freelancer.com CEO Matt Barrie saying in Wiens’ thread that their site had also been aggressively scraped by Anthropic’s crawler. This also doesn’t seem to be new behavior for ClaudeBot, with several months-old Reddit threads reporting a dramatic increase in Anthropic’s web scraping. In April this year, the Linux Mint web forum attributed a site outage to strain caused by ClaudeBot’s scraping activities.



Source link

You May Also Like

More From Author

+ There are no comments

Add yours