Much like virtually every other major player in the global technology arena, Adobe has, over the last several years, made an aggressive and deliberate shift toward the integration of artificial intelligence throughout its software ecosystem. This pivot, though far from unique among tech companies, has been especially visible in Adobe’s extensive rollout of AI-driven products and services since 2023. Perhaps the most recognizable of these initiatives is Firefly, the company’s sophisticated suite of AI-powered media generation tools, designed to enhance and automate creative workflows across its product line. Yet, despite the commercial success and widespread attention that Adobe’s AI ambitions have generated, this fervent embrace of machine learning technologies has now drawn serious controversy. A recently filed lawsuit alleges that Adobe may have overstepped crucial legal and ethical boundaries—specifically, that the company unlawfully used pirated literary works to train one of its advanced language models.

The proposed class-action suit, filed on behalf of Oregon-based author Elizabeth Lyon, accuses Adobe of incorporating unauthorized copies of numerous books—including several of Lyon’s own publications—into the training data used to develop its SlimLM program. This program, summarized by Adobe as a family of compact yet high-performance language models, is said to be engineered for tasks requiring document assistance on mobile devices, where computational efficiency is paramount. According to the company’s official statements, SlimLM was initially trained on SlimPajama-627B, a “deduplicated, multi-corpora, open-source dataset” publicly released by Cerebras in June 2023. However, Lyon contends that key portions of her written work appeared in the pretraining corpus derived from this dataset, claiming that the material in question stems from copied and modified data that incorporated pirated content.

Details highlighted in the legal complaint—first brought to public attention through reporting by Reuters—elaborate on the supposed data lineage connecting Adobe’s model to unlawfully obtained texts. The claim asserts that the SlimPajama dataset itself was not created from wholly original sources but rather through the manipulation and derivation of another dataset, known as RedPajama, which in turn included a vast subcollection identified as Books3. This Books3 dataset, comprising roughly 191,000 digital books, has been a notorious flashpoint in ongoing legal disputes surrounding generative AI. Plaintiffs argue that, because SlimPajama is effectively a derivative duplication of RedPajama, it inherently contains portions of Books3, and therefore encompasses copyrighted works belonging to Lyon and other affected authors.

The inclusion of Books3 in multiple machine-learning training corpora has already precipitated a series of legal confrontations across the tech landscape. RedPajama itself has appeared as a key piece of evidence in similar lawsuits accusing leading technology firms of copyright infringement during the AI model training process. To illustrate, in September, Apple was accused in court of having exploited protected works within its Apple Intelligence model’s training data—purportedly without obtaining the creators’ consent, attribution, or compensation. One month later, Salesforce found itself the target of a parallel complaint asserting that its generative AI systems relied on RedPajama in much the same way.

Such cases are emblematic of a broader pattern now permeating the technology sector. Lawsuits challenging the legality of AI training practices have become increasingly common as developers rely on massive, publicly scraped datasets to construct advanced machine-learning systems. Critics argue that these datasets often blend open-source materials with pirated or otherwise copyrighted works, creating an ethical gray zone that existing intellectual property laws have yet to clarify. In one of the most prominent examples of this trend, Anthropic agreed in September to a $1.5 billion settlement with a group of authors who accused the company of illegally using their books and other written content to train its conversational AI assistant, Claude. That case was widely regarded as a potential legal milestone, signaling a new phase in the struggle to reconcile rapid AI innovation with the longstanding principles of copyright protection. The Adobe lawsuit now joins this growing body of disputes, raising further questions about how creators’ rights will coexist with the ever-expanding capabilities of artificial intelligence in the years ahead.

Sourse: https://techcrunch.com/2025/12/17/adobe-hit-with-proposed-class-action-accused-of-misusing-authors-work-in-ai-training/