Major Sites Are Saying No to Appleâs AI Scraping

In a separate analysis conducted this week, data journalist Ben Welsh found that just over a quarter of the news websites he surveyed (294 of 1,167 primarily English-language, US-based publications) are blocking Applebot-Extended. In comparison, Welsh found that 53 percent of the news websites in his sample block OpenAIâs bot. Google introduced its own AI-specific bot, Google-Extended, last September; itâs blocked by nearly 43 percent of those sites, a sign that Applebot-Extended may still be under the radar. As Welsh tells WIRED, though, the number has been âgradually movingâ upward since he started looking.

Welsh has an ongoing project monitoring how news outlets approach major AI agents. âA bit of a divide has emerged among news publishers about whether or not they want to block these bots,â he says. âI don’t have the answer to why every news organization made its decision. Obviously, we can read about many of them making licensing deals, where they’re being paid in exchange for letting the bots inâmaybe that’s a factor.â

Last year, The New York Times reported that Apple was attempting to strike AI deals with publishers. Since then, competitors like OpenAI and Perplexity have announced partnerships with a variety of news outlets, social platforms, and other popular websites. âA lot of the largest publishers in the world are clearly taking a strategic approach,â says Originality AI founder Jon Gillham. âI think in some cases, there’s a business strategy involvedâlike, withholding the data until a partnership agreement is in place.â

There is some evidence supporting Gillhamâs theory. For example, CondÃ© Nast websites used to block OpenAIâs web crawlers. After the company announced a partnership with OpenAI last week, it unblocked the companyâs bots. (CondÃ© Nast declined to comment on the record for this story.) Meanwhile, Buzzfeed spokesperson Juliana Clifton told WIRED that the company, which currently blocks Applebot-Extended, puts every AI web-crawling bot it can identify on its block list unless its owner has entered into a partnershipâtypically paidâwith the company, which also owns the Huffington Post.

Because robots.txt needs to be edited manually, and there are so many new AI agents debuting, it can be difficult to keep an up-to-date block list. âPeople just donât know what to block,â says Dark Visitors founder Gavin King. Dark Visitors offers a freemium service that automatically updates a client siteâs robots.txt, and King says publishers make up a big portion of his clients because of copyright concerns.

Robots.txt might seem like the arcane territory of webmastersâbut given its outsize importance to digital publishers in the AI age, it is now the domain of media executives. WIRED has learned that two CEOs from major media companies directly decide which bots to block.

Some outlets have explicitly noted that they block AI scraping tools because they do not currently have partnerships with their owners. âWeâre blocking Applebot-Extended across all of Vox Mediaâs properties, as we have done with many other AI scraping tools when we donât have a commercial agreement with the other party,â says Lauren Starke, Vox Mediaâs senior vice president of communications. âWe believe in protecting the value of our published work.â

Source link