Authors Sue AI Companies for Training Models on Pirated Books

The copious troves of mostly pirated written material used to train the current generation of large language models were always destined for legal challenges. Maybe the most interesting of those so far came by way of comedian Sarah Silverman and authors Christopher Golden and Richard Kadrey, who joined together in a lawsuit slamming OpenAI’s ChatGPT and Meta’s LLaMA for being trained on copyright materials
The authors claim the tech companies used text scraped from Library Genesis, Z-Library, Sci-Hub, and other online repositories that host content in violation of copyright rules. The Atlantic recently detailed more than 190,000 books included in the Books3 dataset which was reportedly used to train Meta’s LLaMA model.