Reddit Sues AI Firm Perplexity Over "Industrial-Scale" Content Theft
The "arms race for quality human content" ignites a legal firestorm between Reddit and AI startup Perplexity over data scraping.
October 23, 2025

In a legal move that escalates the growing tensions between content creators and artificial intelligence firms, social media platform Reddit has filed a lawsuit against AI startup Perplexity, alleging a scheme of "industrial-scale" illegal data scraping. The lawsuit, filed in New York federal court, accuses Perplexity and three other companies—data-scraping firm Oxylabs UAB, a web domain described as a "former Russian botnet" called AWMProxy, and Texas-based startup SerpApi—of unlawfully harvesting vast quantities of user-generated content from Reddit's forums to train Perplexity's AI-powered "answer engine."[1][2][3][4] This legal battle highlights the contentious issue of data usage in the age of generative AI and could set significant precedents for how online information is accessed and utilized for commercial AI development. Reddit's complaint alleges that the defendants engaged in a coordinated effort to bypass the platform's protective measures and steal millions of user comments.[1]
Reddit's legal filings detail a multi-pronged strategy allegedly employed by Perplexity and its co-defendants to circumvent the platform's anti-scraping technologies. The social media giant claims that after being denied direct access, the companies resorted to scraping Reddit content indirectly through Google search results, effectively laundering the data to fuel Perplexity's AI models.[5][3][4] This circumvention, Reddit argues, constitutes a violation of its terms of service and infringes on the intellectual property rights of its users.[2] To substantiate its claims, Reddit reportedly set up a "trap" by creating a hidden post that was only visible to Google's search engine.[2][6] Within hours, the content of this hidden post appeared in Perplexity's search results, which Reddit presents as proof of the indirect data harvesting.[2][6] The lawsuit further alleges that after Reddit sent Perplexity a cease-and-desist letter, the AI company not only continued its activities but increased its citation of Reddit content forty-fold, a move that Reddit portrays as a blatant disregard for its intellectual property.[7][8] Reddit is seeking unspecified monetary damages and a court order to block Perplexity from using its data.[9]
In its defense, Perplexity has vehemently denied the allegations, framing the lawsuit as an attempt by Reddit to stifle the open internet and gain leverage in its own data licensing negotiations with larger tech companies like Google and OpenAI.[10][11] In a public statement, Perplexity asserted that its approach is "principled and responsible" and that it will "fight vigorously for users' rights to freely and fairly access public knowledge."[7][9] The AI startup contends that it does not train its own foundational models on content and therefore cannot enter into the type of licensing agreements Reddit has with other firms.[10][12] Perplexity claims that its function is to summarize and cite discussions from platforms like Reddit, similar to how a user might share a link, thereby driving traffic back to the original source.[10][12] The company characterized Reddit's lawsuit as a strong-arm tactic and a "sad example of what happens when public data becomes a big part of a public company's business model."[10][11]
The lawsuit between Reddit and Perplexity is emblematic of a larger, industry-wide conflict over the value and ownership of online data. As AI companies are locked in what Reddit's chief legal officer Ben Lee calls an "arms race for quality human content," the methods by which this data is acquired have come under intense scrutiny.[7][13] Reddit has actively sought to monetize its vast repository of human conversations through licensing deals with companies like Google and OpenAI.[1][6] The platform argues that unauthorized scraping undermines these legitimate partnerships and devalues the user-generated content that is the lifeblood of its community.[5][3] This case, along with similar lawsuits filed by content creators against AI developers, is expected to have far-reaching implications for the future of AI. The outcome could establish critical legal precedents regarding data scraping, fair use, and the responsibilities of AI companies in sourcing their training data. For now, the burgeoning AI industry and the content platforms it relies upon are on a collision course, with the courts left to navigate the complex legal and ethical questions at the heart of this digital-age dispute.
Sources
[1]
[3]
[4]
[5]
[6]
[7]
[10]
[11]
[13]