OTHER

Perplexity Accused of Scraping Websites with AI Access Restrictions

As reported by internet infrastructure provider Cloudflare, AI startup Perplexity has been crawling and scraping content from sites that have explicitly opted out of such practices.

On Monday, Cloudflare published research showing that the AI startup ignored blocks meant to prevent these activities, concealing its identity while scraping web pages. “This was an attempt to circumvent the website’s preferences,” the researchers from Cloudflare remarked.

AI offerings from companies like Perplexity depend largely on vast datasets gathered from the internet. Historically, AI startups have often scraped text, images, and videos from various websites without acquiring permission. Recently, sites have begun to counter this trend using the Robots.txt standard, which guides search engines and AI companies on which pages can be indexed, though the effectiveness of these measures has varied.

Cloudflare alleges that Perplexity has deliberately been circumventing these blocks by altering its bots’ “user agent,” a string that identifies a visitor’s device and type. Additionally, they reportedly modified their autonomous system networks (ASN), a numerical identifier for large networks on the internet.

“This behavior was observed across tens of thousands of domains, with millions of requests each day. We utilized machine learning alongside network signals to fingerprint this crawler,” Cloudflare’s report stated.

Jesse Dwyer, a representative from Perplexity, dismissed Cloudflare’s blog post as a “sales pitch” and claimed in an email to TechCrunch that the screenshots provided “demonstrate that no content was accessed.” In a follow-up email, Dwyer asserted that the bot mentioned in the Cloudflare article “isn’t even ours.”

Cloudflare first detected this behavior after receiving complaints from clients about Perplexity crawling and scraping their sites, despite their implementations of rules in their Robots.txt file to block known Perplexity bots. Cloudflare confirmed through tests that Perplexity was indeed evading these restrictions.

TechCrunch Event

San Francisco
|
October 27-29, 2025

Cloudflare noted, “We found that Perplexity not only used its declared user-agent, but also a generic browser mimicking Google Chrome on macOS when their designated crawler was blocked.”

The company has since removed Perplexity’s bots from its verified list and established new methods to block them.

Recently, Cloudflare has adopted a public position against AI crawlers. Last month, they launched a marketplace enabling website owners and publishers to charge AI scrapers visiting their sites. Cloudflare’s CEO, Matthew Prince, cautioned that AI is disrupting the traditional business models of the internet, particularly for publishers. The company also introduced a free tool last year designed to prevent bots from scraping websites for AI training purposes.

This is not the first time Perplexity has faced allegations of unauthorized scraping.

Over the past year, several news outlets, including Wired, have accused Perplexity of plagiarizing their content. Shortly after, Perplexity’s CEO Aravind Srinivas struggled to define plagiarism during an interview with TechCrunch’s Devin Coldewey at the Disrupt 2024 conference.