AI Training Declared Fair Use; Anthropic Faces Trial for Piracy

A landmark ruling gives AI training fair use protection, yet cracks down on companies using pirated content.

June 25, 2025

AI Training Declared Fair Use; Anthropic Faces Trial for Piracy
In a landmark decision with significant ramifications for the artificial intelligence industry, a federal judge has ruled that AI company Anthropic's use of copyrighted books to train its large language models constitutes fair use under U.S. copyright law. However, the court also found that Anthropic's methods for acquiring a portion of that content, specifically the downloading and storing of millions of pirated books, infringed on the authors' copyrights, setting the stage for a trial to determine damages. This dual ruling offers a nuanced, and potentially precedent-setting, perspective on the burgeoning legal battles between AI developers and content creators.
The case was brought against Anthropic, the developer of the AI model Claude, by a group of authors including Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, who alleged their work was used without permission or compensation in a manner they described as "large-scale theft."[1][2] U.S. District Judge William Alsup of the Northern District of California, in a ruling issued in late June 2025, sided with Anthropic on the core question of AI training.[1][3][4] He determined that the process of using copyrighted works to train an AI model was "quintessentially transformative" and "spectacularly so."[1][5][6] The judge reasoned that, much like a human writer learns from reading a multitude of works to develop their own style and understanding, Anthropic's AI did not aim to replicate or replace the original books but to learn from them to create something new and different.[1][5] This finding represents a major victory for AI companies, which have argued that their training methods are a form of transformative use essential for innovation and are protected by the fair use doctrine.[4][7]
A central pillar of Anthropic's defense was that its AI systems study written works to extract uncopyrightable elements like facts, themes, and writing styles, thereby promoting creativity and technological progress as encouraged by U.S. copyright law.[4][7] Judge Alsup's decision supported this view, drawing a clear distinction between the act of training for the purpose of creating a new tool and the act of producing content that directly competes with or supplants the original copyrighted work.[1][8] The court noted that the plaintiffs did not demonstrate that Anthropic's AI, Claude, could generate outputs that would serve as a substitute for their books.[8] This part of the ruling covers books that Anthropic legally purchased, even in print form, which it then scanned and digitized for its internal library after destroying the physical copies.[6][8] The court found this digitization of legally acquired material to be a permissible fair use as it facilitated internal search and storage without creating additional distributed copies.[9][8]
Despite the favorable ruling on the principle of fair use in training, the court drew a firm line regarding the source of the training data. The judge found that Anthropic had engaged in copyright infringement by downloading millions of books from known pirate sites.[1][2] Court documents revealed that Anthropic employees had expressed concerns about the legality of this practice.[1] Judge Alsup rejected the argument that the eventual use of these pirated materials for a transformative purpose could excuse the initial act of infringement.[2] He stated that building a "central library" of pirated books was not fair use, and the company could not justify downloading from pirate sites when it could have legally purchased or accessed the works.[2][6] The ruling made it clear that "There is no carveout...from the Copyright Act for AI companies."[8] Consequently, Anthropic will face a trial in December to determine the extent of its liability and potential damages for this infringement, which could be substantial, with statutory damages reaching up to $150,000 per infringed work.[2][3]
The implications of this bifurcated ruling are profound for the rapidly evolving AI landscape. It provides the first significant judicial endorsement for the AI industry's "fair use" argument regarding the training of models on copyrighted content, a position long advocated by tech giants like Google, Meta, and OpenAI, who face similar lawsuits.[4][10] This could embolden AI developers and potentially streamline innovation by providing a legal framework that supports the use of vast datasets necessary for building advanced AI systems. However, the decision also serves as a stark warning about the importance of sourcing data legally. The ruling against the use of pirated materials underscores that the methods of data acquisition are as legally significant as the subsequent use of that data. This may push the industry towards more transparent data acquisition practices and increase the pressure to negotiate licensing agreements with copyright holders.[1][11] While Anthropic celebrated the "transformative use" portion of the ruling, the upcoming trial on damages for piracy demonstrates that the legal and financial risks for AI companies that have relied on unauthorized sources remain very real.[5][12] The case highlights the complex and ongoing struggle to balance the encouragement of technological innovation with the fundamental principles of copyright protection for creators.

Sources
Share this article