As we’ve previously written, the rise of generative AI has led to a spate of copyright suits across the country. One major target of these suits has been OpenAI. Actor/comedian Sarah Silverman and author Paul Tremblay are among the plaintiffs to bring suit in California, while authors George R.R. Martin, John Grisham, and others have filed in New York. The lawsuits allege that OpenAI used the plaintiffs’ creative content without permission to train OpenAI’s generative AI tool in violation of the U.S. Copyright Act. OpenAI moved to dismiss the majority of claims in the Silverman and Tremblay cases on several bases: (1) the Copyright Act does not protect ideas, facts, or language; (2) the plaintiffs cannot show that outputs from OpenAI’s large language model (“LLM”) tool are substantially similar to the original content used to train the tool; and (3) any use of copyright-protected content by OpenAI’s tool constitutes fair use, and thus is immune to liability under the Act. Yesterday, Plaintiffs hit back, noting that OpenAI hasn’t moved to dismiss the “core claim” in the lawsuits—direct infringement.

Interestingly, although OpenAI moved to dismiss most of the Silverman and Tremblay claims, it did not seek dismissal of Plaintiffs’ claims for direct copyright infringement, which OpenAI said it would “seek to resolve as a matter of law at a later stage of the case.” OpenAI’s arguments instead focused on vicarious liability and failure to state a claim under Section 1202(b) of the Digital Millennium Copyright Act (the “DMCA”). They argued that Plaintiffs have not explained how ChatGPT’s outputs are substantially similar to Plaintiffs’ works. It is not enough for the output to merely be “based upon” another work, Open AI argued. OpenAI also made passing reference to the fair use defense, arguing that courts are empowered to adapt the defense “to account for ‘rapid technological change.’” Although it did not develop the argument, OpenAI stated briefly that creating copies of a work in order “to develop a new, non-infringing product” would be protected as fair use. Plaintiffs’ conclusory statements that OpenAI “has the right and ability to control the output of the OpenAI Language Models” and “benefitted financially” are insufficient to establish the remaining elements of this claim, according to Open AI.

As for the DMCA claim, OpenAI asserted that Plaintiffs failed to make any plausible allegations that copyright management information was removed during the LLM tool’s training process, and that Plaintiffs’ allegations were not sufficient to draw a reasonable inference that OpenAI designed its process with the requisite intent to conceal infringement. OpenAI further contended that Plaintiffs’ state statutory and common law claims for unfair competition, negligence, and unjust enrichment failed, including because they were predicated on the other claims or preempted by the Copyright Act.

In response, Plaintiffs argued that substantial similarity is a “red herring” in this context because it is not an element of a direct copyright infringement claim, and only applies where there is no evidence of direct copying of the at-issue work. Because they alleged direct copying of their works by OpenAI, the substantial similarity test is irrelevant, they contended. Plaintiffs also argued that the fair use defense cannot be resolved at this stage of the case, and that the Copyright Act is aimed primarily at granting rights to authors, not at protecting users of the authors’ works.  They also queried whether training an AI model would constitute fair use, but acknowledged that “no U.S. court has squarely ruled on the question.”

Plaintiffs further argued that they had properly articulated three separate theories of direct infringement to support their vicarious infringement claim: (1) “training” or “input” infringement, copying Plaintiffs’ books in their entirety to train the AI model; (2) “model” infringement, because the LLMs are dependent on “expressive information extracted from Plaintiffs’ works (and others) retained inside them,” rendering the “LLMs . . . themselves infringing derivative works”; and (3) “output” infringement, because the LLMs are themselves infringing derivative works, their “textual output” is infringing as well. Further, they alleged that OpenAI had the right to stop the infringing conduct because it controlled the training data for the LLMs, the LLMs themselves, and ChatGPT; moreover, OpenAI profited from ChatGPT.

With respect to the DMCA claims, Plaintiffs argued that, among other things, Plaintiffs’ works contained CMI and that “OpenAI intentionally removed CMI from” their protected works.  Plaintiffs further argued that their state-law statutory claims are not preempted because those claims are based upon unauthorized use of Plaintiffs’ works, rather than the copying and reproduction of those works, and that Plaintiffs’ DMCA claim is properly pled and thus serves as an appropriate predicate. Finally, Plaintiffs asserted that their common-law claims for negligence and unjust enrichment were properly pled and that other courts have allowed similar claims to proceed alongside copyright infringement claims.

OpenAI will have the opportunity to file a reply in support of its motion before the court rules on any of these issues. It will be interesting to see the additional arguments raised, and to see how the court’s ultimate decision bears out, given that this is likely to be the first of many court decisions on the issue of whether LLMs will be allowed to continue to pull from others’ creative content for training purposes without compensating prior rights holders.