A group of authors is suing artificial intelligence startup Anthropic, claiming that the company committed “grand theft” by training its popular chatbot, Claude, to pirate copies of copyrighted books.
A similar lawsuit has been filed for over a year against rival OpenAI, the developer of ChatGPT, but this is the first lawsuit by an author to target Anthropic and its Claude chatbot.
The small San Francisco-based company, founded by former OpenAI leaders, markets itself as a more responsible, safety-focused developer of generative AI models that can compose emails, summarize documents, and interact with people in natural ways.
But the lawsuit, filed Monday in federal court in San Francisco, argues that Anthropic’s use of a repository of pirated documents to build its AI product “makes a mockery of the company’s lofty goals.”
“It is no exaggeration to say that Anthropic’s business model seeks to profit from the human expression and ingenuity behind each work of art,” the lawsuit states.
Anthropic did not immediately respond to a request for comment Monday.
The lawsuit was filed by three authors — Andrea Bartz, Charles Graeber and Kirk Wallace Johnson — who seek to represent similarly situated fiction and non-fiction authors.
While this is the first lawsuit against Anthropic by a book author, the company is also fighting a lawsuit brought by a major music publisher that claims Claude has copied lyrics from copyrighted songs.
The authors’ lawsuit joins a growing number of lawsuits filed in San Francisco and New York against developers of AI large-scale language models.
OpenAI and its business partner Microsoft are already dealing with a series of copyright infringement lawsuits led by celebrities such as John Grisham, Jodi Picoult and “Game of Thrones” novelist George R.R. Martin, as well as from media outlets including The New York Times, Chicago Tribune and Mother Jones.
A common thread among these lawsuits is the allegation that tech companies have trained AI chatbots to generate human-like text by training them on vast amounts of human-written text without obtaining permission or compensation from the authors of the original works. The lawsuits have come from authors, as well as visual artists, music labels, and other creators, who claim that the profits from generative AI are based on misappropriation.
Anthropik and other tech companies argue that training their AI models falls within the “fair use” doctrine of U.S. law, which allows for limited uses of copyrighted material, such as for teaching, research or transforming the work into something else.
But the lawsuit against Anthropic accuses the company of using a dataset it calls “The Pile,” which contains a mountain of pirated books, and also disputes the idea that its AI systems learn in the same way that humans do.
“People who learn from books pay at least some compensation to authors and creators by purchasing legal copies or borrowing them from libraries that buy books,” the lawsuit states.
———
The Associated Press and OpenAI have a licensing and technology agreement that gives OpenAI access to portions of the AP’s text archive.