Many Large Websites Begin Blocking Apple’s AI Web Crawler After Increased Warnings

many-large-websites-begin-blocking-apple’s-ai-web-crawler-after-increased-warnings
Many Large Websites Begin Blocking Apple’s AI Web Crawler After Increased Warnings

Many large-scale websites are getting tired of Apple and its AI web crawler.

Despite several warnings issued in the past, the iPhone maker’s web crawler continues to land on different websites and extracts data for Apple’s AI training endeavors. The list of names who have confirmed this act of Apple are plenty and they are some very big names in the tech world.

While Apple continues to deny the act and says it respects the wishes of these companies, it’s doing otherwise.

As per recently published research, Apple is crawling on sites belonging to Instagram, Facebook, Tumblr, Craigslist, the NYT, The Financial Times, The Atlantic, and Vox Media. Now, many of these organizations are left with no choice but to block the company as Apple continues giving the cold shoulder.

The practice of using robotic crawlers is not something new. It’s been in the works for a long time but these bots are now taking others’ data for training AI models. And that’s not something many feel is right as it robs them of their hard work.

The news is alarming as Apple Intelligence is all set to launch soon and we can see why Apple might be engaging in these tactics. As per reports from the NYT, many companies are blocking web crawlers at a record-breaking rate and most of these are AI firms.

The study that came to that conclusion shared how 14k web domains were included for training AI data sets. It’s a necessary step that publishers feel they need to take to stop data from getting harvested.

As per estimates from research experts, around 5% of all data present and 25% of data coming from the best sources was restricted. Most of those were set up via a Robots Exclusion Protocol which is an ancient method for owners of different webpages to stop bots from this move. It’s usually done via files dubbed robots.txt.

It’s very interesting to see Apple disregard the voices of many companies and enable tradeoffs of AI crawlers for scraping AI websites and deciding that it’s not worth it.

But we should not be surprised if more AI giants go about blocking companies now after learning about Apple’s shocking actions. Interestingly, most of the firms complaining are big tech giants. Smaller-scale enterprises don’t seem to mind or care as it makes little to no difference to them.

Image: DIW-Aigen

Read next: OpenAI and Anthropic Agree To Share AI Models With US AI Safety Institute