Reddit's 'Trap' Exposes Perplexity's Data Scraping, Sparks Landmark Lawsuit

Reddit's legal trap for Perplexity uncovers a shadowy "data laundering" scheme, setting a precedent for AI data access.

October 23, 2025

Reddit's 'Trap' Exposes Perplexity's Data Scraping, Sparks Landmark Lawsuit
In a calculated move that has sent ripples through the artificial intelligence industry, social media platform Reddit has taken legal action against AI search company Perplexity, alleging a scheme to illicitly scrape its vast repository of user-generated content.[1][2][3] The lawsuit, filed in a New York federal court, goes beyond Perplexity, naming three data-scraping firms—Oxylabs, SerpApi, and a shadowy entity identified as AWMProxy—as co-conspirators in what Reddit's Chief Legal Officer Ben Lee has termed an "industrial-scale 'data laundering' economy."[2][3] This legal battle highlights a growing conflict between content platforms, which are increasingly protective of their valuable data, and AI developers, whose models are voraciously hungry for the kind of authentic human conversation that flourishes on sites like Reddit. The case is poised to set a significant precedent for how AI companies can source their data and the legal ramifications of circumventing platform protections.[4]
The centerpiece of Reddit's complaint is a meticulously planned "trap" designed to prove its suspicions that Perplexity was accessing its content without authorization by scraping Google search results.[5] According to the lawsuit, Reddit created a "test post" on its platform that was configured to be discoverable only by Google's search engine crawler and was otherwise hidden from direct view on the site.[5][6] Within hours of the post being indexed by Google, its content appeared in the answers generated by Perplexity's AI search engine.[5][6] For Reddit, this was the smoking gun, demonstrating that Perplexity was not respecting the platform's access protocols and was instead using Google as an intermediary to obtain its data.[5] The lawsuit alleges that this indirect scraping method was a deliberate tactic to bypass the technical safeguards Reddit has implemented, which cost the company "tens of millions of dollars" to maintain.[5][6] The complaint further claims that after Reddit issued a cease-and-desist letter to Perplexity, the AI company's citations of Reddit content paradoxically increased forty-fold, reinforcing Reddit's belief that the scraping was intentional and ongoing.[4][5]
In its defense, Perplexity has vehemently denied the allegations, framing the lawsuit as a strong-arm tactic by Reddit to force it into a licensing agreement.[7] Perplexity contends that it lawfully accesses public data and that its use of Reddit content is limited to summarizing discussions and providing citations, which it argues is essential for users to verify the accuracy of the AI-generated answers.[7] The company has stated that as an "application-layer company," it does not train its foundational models on the content, making a data-licensing agreement for training purposes irrelevant to its operations.[7] Perplexity has characterized Reddit's legal action as a "show of force" intended to strengthen its negotiating position with other major AI players like Google and OpenAI, with whom Reddit has already secured lucrative data-licensing deals.[7] The AI firm has vowed to "fight vigorously for users' rights to freely and fairly access public knowledge," positioning the legal battle as a fight for an open internet against a company trying to monetize publicly available information.[8]
The lawsuit extends beyond Perplexity to implicate a shadowy ecosystem of data brokers that allegedly facilitate the scraping.[2] Named in the suit are Oxylabs, a Lithuanian data-scraping company; SerpApi, a Texas-based startup that provides real-time access to scraped Google search results; and AWMProxy, which Reddit's legal filings describe as a "former Russian botnet."[9][10] Reddit's complaint compares these firms to "would-be bank robbers" who, unable to breach the vault directly, instead target the armored truck.[9] This analogy underscores Reddit's core argument: that these companies are circumventing its direct defenses by scraping its content from Google's search index, then packaging and selling that data to AI companies like Perplexity.[5] This burgeoning market for scraped data is what Reddit's legal chief refers to as a "'data laundering' economy," where content is harvested without permission, stripped of its original context, and resold for the purpose of training and powering AI systems.[2] Both Oxylabs and SerpApi have stated their intent to defend themselves against the allegations, with Oxylabs expressing it was "shocked and disappointed" by the lawsuit.[1]
The legal confrontation between Reddit and Perplexity is a watershed moment for the AI industry, bringing to the forefront critical questions about copyright, fair use, and the ethical responsibilities of AI developers.[2] Legal experts note that the case could have far-reaching implications, potentially redefining the legal boundaries of data scraping and setting new standards for how AI models can be trained.[11] If Reddit's arguments prevail, it could embolden other content creators to pursue similar legal action, potentially disrupting the data supply chain for many AI companies and forcing a shift towards more transparent, consent-based data acquisition through licensing agreements.[4] Conversely, a victory for Perplexity could reinforce the argument that publicly available data is fair game for AI development, potentially weakening the negotiating power of content platforms.[4] The outcome of this high-stakes legal battle will undoubtedly shape the future relationship between the creators of online content and the architects of artificial intelligence, determining the rules of engagement in an era where data is one of the most valuable commodities.

Sources
Share this article