Authors Sue Apple for Training AI with Pirated Books

Authors claim Apple's AI learned from pirated books, pushing for new industry standards and creator compensation.

September 6, 2025

Authors Sue Apple for Training AI with Pirated Books
Apple is facing a proposed class-action lawsuit filed by authors who allege the tech giant used their copyrighted books without permission to train its artificial intelligence models.[1][2][3] The lawsuit, filed in a Northern California federal court by authors Grady Hendrix and Jennifer Roberson, claims that Apple utilized a dataset containing pirated books to develop its OpenELM large language models.[4][5][6] This legal action places Apple alongside other major technology firms like Microsoft, Meta, and OpenAI, all of which are embroiled in a growing battle over the use of intellectual property in the age of generative AI.[1][2] The authors allege that Apple copied their protected works without consent, credit, or compensation for what they describe as a "potentially lucrative venture."[1][2]
The core of the plaintiffs' argument is that their works were included in a dataset known as "Books3," which is described in the lawsuit as a collection of pirated books sourced from a "shadow library" website.[7] The complaint further alleges that Apple accessed this dataset through a larger collection of data called "RedPajama," which was used to train the OpenELM models.[8][7] The authors are seeking statutory and compensatory damages, restitution for the use of their work, and a court order for the destruction of any Apple AI models that were trained on the infringing dataset.[8] Apple has not yet publicly responded to the specific allegations in the lawsuit.[8][1]
This lawsuit against Apple is unfolding against the backdrop of a rapidly evolving legal landscape concerning AI and copyright law. A central issue in these cases is the "fair use" doctrine, a legal principle that permits the limited use of copyrighted material without permission from the rights holder for purposes such as criticism, commentary, and research.[9][10] AI companies frequently argue that training their models on vast amounts of data, including copyrighted works, constitutes a "transformative" use that is protected by this doctrine.[9][10] They contend that the AI is not simply reproducing the original works but is learning from them to create something new.[10] However, recent court rulings have begun to draw a critical distinction between the use of legally acquired copyrighted material and the use of pirated content.
A significant precedent was recently set in a similar case involving AI startup Anthropic. The company agreed to a landmark $1.5 billion settlement with a group of authors who accused it of using their books without permission to train its chatbot, Claude.[6][11] While Anthropic did not admit liability, the settlement, described as the largest publicly reported copyright recovery in history, has sent a powerful message to the AI industry.[1][2] Legal experts note that the Anthropic case highlighted a crucial nuance: while a judge ruled that the act of training an AI model on copyrighted books could be considered a transformative and therefore fair use, the company's use of pirated books to build its internal data library was not protected.[12] This distinction suggests that the origin of the training data is a critical factor for the courts, a point that will likely be central to the case against Apple.[9]
The implications of these lawsuits for the future of the artificial intelligence industry are profound. The high cost of litigation and the potential for substantial damages, as evidenced by the Anthropic settlement, may compel AI developers to be more transparent about their training data and to seek licensing agreements with content creators.[9] Some legal experts suggest that the industry is at a turning point, where the practice of indiscriminately scraping the internet for training data will face increasing legal challenges.[4] The outcome of the lawsuit against Apple and similar cases will likely shape the development of AI, potentially leading to new industry standards for data acquisition and greater compensation for creators whose work is used to power these powerful new technologies.[9]

Sources
Share this article