Hundreds of websites trying to block the AI company Anthropic from scraping their content are blocking the wrong bots, seemingly because they are copy/pasting outdated instructions to their robots.txt files, and because companies are constantly launching new AI crawler bots with different names that will only be blocked if website owners update their robots.txt.
In particular, these sites are blocking two bots no longer used by the company, while unknowingly leaving Anthropic’s real (and new) scraper bot unblocked.
This is an example of “how much of a mess the robots.txt landscape is right now,” the anonymous operator of Dark Visitors told 404 Media. Dark Visitors is a website that tracks the constantly-shifting landscape of web crawlers and scrapers—many of them operated by AI companies—and which helps website owners regularly update their robots.txt files to prevent specific types of scraping. The site has seen a huge increase in popularity as more people try to block AI from scraping their work.
“The ecosystem of agents is changing quickly, so it’s basically impossible for website owners to manually keep up. For example, Apple (Applebot-Extended) and Meta (Meta-ExternalAgent) just added new ones last month and last week, respectively,” they added.
robots.txt doesn’t prevent scrapping.
It is just a suggestion. Bots that care will read it, bots that don’t, won’t.
404 Media Media Bias Fact Check Credibility: [High] (Click to view Full Report)
404 Media is rated with High Creditability by Media Bias Fact Check.
Bias: Left-Center
Factual Reporting: Mostly Factual
Country: United States of America
Full Report: https://mediabiasfactcheck.com/404-media-bias/Check the bias and credibility of this article on Ground.News:
- https://ground.news/find?url=https%3A%2F%2Fwww.404media.co%2Femail%2F87e0e07a-7d24-4788-b417-1821b40c8c1d%2F
Thanks to Media Bias Fact Check for their access to the API.
Please consider supporting them by donating.Footer
Media Bias Fact Check is a fact-checking website that rates the bias and credibility of news sources. They are known for their comprehensive and detailed reports.
Beep boop. This action was performed automatically. If you dont like me then please block me.💔
If you have any questions or comments about me, you can make a post to LW Support lemmy community.I fucking hate this thing. Could you make it any more wordy and annoyingly formatted? Yeah I know I can block it, and I will it’s just super annoying.
I wish it was reduced down to two or three lines tops. It’s longer than some of the useless reddit automod comments.
it would be useful as a client side plugin imo