A group of writers allege in a recent court filing that Mark Zuckerberg approved using pirated ebooks to teach Meta AI, also after his own staff warned that the material had been obtained illegally.

In a California national court in July 2023, a group of scholars, including Richard Kadrey, Christopher Golden, and Sarah Silverman, filed the claims in response to a trademark infringement complaint. The party claimed Meta abused their books to aid LLM training, and they are now suing for injuries and a temporary injunction to stop Meta from using their functions. In November of that same year, the prosecutor in the case dismissed the majority of the author’s promises, but these new allegations may give new life to the legal debate.

” Meta’s CEO, Mark Zuckerberg, approved Meta’s use of the LibGen data notwithstanding problems within Meta’s AI professional team ( and others at Meta ) that LibGen is’ a database we know to get pirated,'” attorneys for the defendants said in a Wednesday processing. Despite these dark colors, the lawsuit alleges that,” after escalation”, Zuckerberg gave the green light for Meta’s AI staff to deal with using the controversial data.

Reps for Meta did not respond to ‘s request for comment right away.

LibGen, little for Bank Genesis, is an online program that provides free access to books, scientific papers, posts, and other published publications without adequately abiding by copyright laws. It acts as a” shadow library,” providing these materials without permission from producers or rights holders. It now hosts over 33 million ebooks and over 85 million posts.

The complaint alleges that Meta attempted to keep this secret until the last possible second. The business dumped what plaintiffs claim are” some of the most incriminating inside documents it has produced to time” only two hours before the point finding deadline on December 13, 2024.

Meta’s personal engineers seemed nervous with the program, according to comments in court papers. The group of authors allege internal messages reveal Meta engineers were hesitant to download the allegedly pirated content, with one stating that” torrenting from a]Meta-owned ] corporate laptop doesn’t feel right ( smile emoji )”. However, according to the lawsuit, they then proceeded to not only get the books but moreover systematically remove copyright information to make room for AI training.

The most recent complaint files portray a company that is aware of the risks: One internal letter warned that “media cover suggesting we have used a database we know to be pirated, for as LibGen, may destroy our negotiating place with authorities.” Yet Meta went ahead anyway, both downloading and distributing ( or” seeding” ) the pirated content through torrenting networks by January 2024, according to the lawsuit.

When asked about these activities in a deposition, Zuckerberg said he would not disagree with the decision, claiming that such piracy would “raise a lot of red flags” and” seems like a bad thing.”

The court documents also point out that model training was given more weight in Meta’s handling of copyrighted information than copyright regulations. One engineer “filtered [ …] copyright lines and other data out of LibGen to prepare a CMI-stripped version of it to train Llama,” according to the filing. This systematic removal of copyright information may support the authors ‘ claims that Meta purposefully attempted to conceal the use of pirated materials.

The revelations come at a crucial time for Meta’s AI ambitions. The company has been making strides to compete with OpenAI and Google in the AI space, with Meta AI being a strong free alternative to ChatGPT and Llama 3.2 being the most well-known open source LLM and Llama 3.2 being the most well-known.

Most of these AI companies are facing legal battles because of their dubious methods for training their large language models. Another group of authors has already brought legal action against Meta, OpenAI is currently facing legal action for training its LLMs in copyrighted material, and Anthropic is also facing legal action against Meta for publishing a different book and song lyrics.

However, since generative AI gained popularity in general, tech entrepreneurs and creators have been in a frenzy. Difficult lawsuits are currently being brought against AI companies because they knowingly use copyrighted material to train their models. We’ll have to wait and see what the courts have to say about it all, as with most things on the bleeding edge.

Generally Intelligent Newsletter

A generative AI model called Gen narrates a weekly AI journey.

Share This Story, Choose Your Platform!