Mark Zuckerberg authorized Meta to use “pirated copies” of his copyrighted books to train the company’s artificial intelligence models, a group of authors says in a US court filing. insisted.
The filing cites internal meta-communications that the social networking company’s chief executive officer warned that the dataset was “known to be pirated” within the company’s AI executive team. , claims to have supported the use of the LibGen dataset, a vast online archive of books. .
Internal messages say using a database containing pirated content could undermine negotiations between Facebook and Instagram’s owners and regulators, according to the filing. “Media reports suggesting that we have used datasets known to be pirated, such as LibGen, could undermine our bargaining position with regulators.”
Other authors suing Mehta for copyright infringement, including American author Ta-Nehisi Coates and comedian Sarah Silverman, made the accusation in a filing made public in California federal court on Wednesday. .
The authors sued Meta in 2023, claiming that the social media company misused their books to train Llama, a large-scale language model that powers chatbots.
The Library Genesis (LibGen) dataset is a “shadow library” from Russia that claims to contain millions of novels, nonfiction books, and scientific journal articles. Last year, a New York federal court ordered the anonymous operator of LibGen to pay a group of publishers $30m (£24m) in damages for copyright infringement.
The use of copyrighted content in training AI models has become a legal battleground in the development of generative AI tools, such as ChatGPT chatbots, and creative professionals and publishers have been warned that their work may be used without permission. It warns that doing so will put your livelihood and business model at risk.
The filing cites a memo that quotes Mark Zuckerberg’s initials and states that “following escalation to MZ,” Meta’s AI team was “approved to use LibGen.”
The filing cites internal communications saying that Meta engineers discussed accessing and reviewing LibGen data, but that “torrenting,” meaning peer-to-peer sharing of files, ” He also said he was hesitant to begin the process because “he cannot access it from anywhere.” i feel sick. ”
U.S. District Judge Vince Chhabria ruled last year that the text generated by Meta’s AI model infringed the author’s copyright and that Meta illegally included a book’s Copyright Management Information (CMI), which refers to information about a work, including its title. rejected the claim that he had been deprived. Name of author and copyright holder. However, the plaintiff was granted permission to amend its claims.
The authors argued this week that the evidence supports their infringement claims and warrants reinstating the CMI case and adding new computer fraud claims.
At Thursday’s hearing, Chhabria said he would allow the writers to file an amended complaint, but expressed skepticism about the fraud and the validity of CMI’s claims.
We have reached out to Meta for comment.
Reuters contributed to this article