Westlaw and AI: Headnotes as Sculpture?

Now that the media has moved on to other things, such as Elon Musk’s 13th child, it may be possible to calmly review the February 11th opinion in Thomson Reuters Centre GmbH v. Ross Intelligence, Inc., 2025 WL 458520 (D. Delaware). That opinion follows on, and changes some of the court’s 2023 conclusions on summary judgment, 694 F.Supp.3d 467. Upon issuance of the February 11 opinion, we were presented with headlines like “A Legal Win Could Change the way AI Models Get Built,”  and “A Win at Last: Big Blow to AI world in training data copyright scrap.”  

Whether knowing clickbait or not, I hope readers do not get their understanding of the implications of the opinion from such sources. Even a cursory analysis shows the decision is modest in scope and on facts that cannot be applied beyond the unique context of that dispute.  Whatever the results will be in other AI cases, those results are highly unlikely to be influenced by this one, although I imagine Thomson Reuters is still happy. 

The opinion is by a Third Circuit judge, sitting by designation, Stephanos Bibas. I found three other opinions in copyright cases where he also sat by designation. He had a brilliant academic career, and clerked for Justice Anthony Kennedy. Before joining the bench in 2017, he was a law professor at the University of Pennsylvania. Since being appointed to bench he has been a prolific author of opinions for the court of appeals and various district courts within the circuit. He is often praised for his clear writing style.  Nevertheless, there are a number of things about the opinion I have yet to fully grasp. 

Plaintiff is the publisher of my treatises, both of which are on Westlaw. Westlaw’s headnotes and key numbering system are the subject of the litigation. I don’t use either tool, preferring to read the real thing, but apparently others do, or at least competitors to Westlaw think they do, including defendant Ross Intelligence. Ross’s conduct is described by Judge Bilbas this way:

Ross, a new competitor to Westlaw, made a legal-research search engine that uses artificial intelligence. To train its AI search tool, Ross needed a database of legal questions and answers. So Ross asked to license Westlaw's content. But because Ross was its competitor, Thomson Reuters refused. So to train its AI, Ross made a deal with LegalEase to get training data in the form of “Bulk Memos.” Bulk Memos are lawyers’ compilations of legal questions with good and bad answers. LegalEase gave those lawyers a guide explaining how to create those questions using Westlaw headnotes, while clarifying that the lawyers should not just copy and paste headnotes directly into the questions.  LegalEase sold Ross roughly 25,000 BulkMemos, which Ross used to train its AI search tool In other words, Ross built its competing product using Bulk Memos, which in turn were built from Westlaw headnotes. When Thomson Reuters found out, it sued Ross for copyright infringement.

And that’s only one of many reasons why this case can’t be predictive of how the other lawsuits against AI companies will be decided.  The unauthorized copying of the headnotes was by hired lawyers, who put them into memos for a third party company, which gave the memos to Ross, which then used the memos to train its system. Another reason this case is like no others is the headnotes themselves, which can’t be compared to the novels, works of art, or Sarah Silverman’s “Bedwetter” involved in other AI training model cases. The headnotes are, critically, derived from uncopyrightable judicial opinions, and their protectability is a very important part of the dispute.

Westlaw employs hundreds of skilled attorney editors, adding 250 in 2022 alone for its Westlaw Precision product, an expanded version of its Key Numbering system. I have known some of these attorney editors. The editors of my treatises have been knowledgeable, thorough, and conscientious. My treatises have benefitted greatly from their efforts. But as it pertains to headnotes, are they original for copyright purposes? Compilations of them are, unquestionably, but individual ones?  Even if some headnotes are original, are all of them? 

Judge Bilbas thought so, and for me, this is the most interesting part of the opinion:

[E]ach headnote is an individual, copyrightable work. That became clear to me once I analogized the lawyer's editorial judgment to that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable. 17 U.S.C. § 102(a)(5). So too, even a headnote taken verbatim from an opinion is a carefully chosen fraction of the whole. Identifying which words matter and chiseling away the surrounding mass expresses the editor's idea about what the important point of law from the opinion is. That editorial expression has enough “creative spark” to be original. Feist, 499 U.S. at 345. So all headnotes, even any that quote judicial opinions verbatim, have original value as individual works.

Analogies sure can be powerful. I have looked at many headnotes in writing this blog. Never once did I think of Michelangelo, Donatello, Rodin, Henry Moore, or even Alberto Giacometti. Judge Posner criticized reasoning by analogy:

One always requires a general understanding of some sort in order to determine relevant similarities. In a legal case it is an understanding of rules, principles, doctrines, and policies. It is they that do the work in reasoning by analogy.

Reasoning by analogy as a mode of judicial expression is a surface phenomenon. It belongs not to legal thought, but to legal rhetoric. Reasoning by analogy tends to obscure the policy grounds that determine the outcome of a case, because it directs the reader's attention to the cases that are being compared with each other rather than to the policy considerations that connect or separate the cases.

Richard Posner, Reasoning by Analogy, 91 Cornell L. Rev. 761, 765 (2006). 

The same is true when the analogy is between two very different types of works. Creating a work of sculpture is not analogous to writing headnotes about judicial opinions. Not only is a work of sculpture your own work (and made without any of the constraints involved with judicial opinions), headnotes involve digesting someone else's uncopyrighted material and in a short, standard format. That is hardly the firmest foundation for originality as Judge Bilbas recognized in his first opinion denying summary judgment.

What changed his mind was his sculpture analogy., in which subtracting becomes the act of originality. Subtractive sculpture is the most ancient form of sculpture, and used by Michelangelo, among others. It is also the most difficult because if you remove too much you have to start all over again. But the analogy to subtractive sculpture is inapt: when Michelangelo chiseled away at a block of marble, he was forming a work that wasn't there before. David was not already embedded in the block of marble Michelangelo bought from the quarry in Carrara. Michelangelo wholly created that image. One can't call what he did subtracting in the sense Judge Bilbas was using the analogy. The judicial opinions did already exist, and the headnotes were created by eliminating words from them, unlike with Michelangelo who "added" the figure of David to a blank block of marble. What matters is what you add, what is your expression. It is hard to see how that occurred with the headnotes since the goal was to convey a relevant legal idea from uncopyrighted material. There are times when pure subtracting could be enough, as with extensive editing of a Bruckner symphony, for example, where the more you take out the more grateful the audience will be, but there you are starting with a mass of protected material: with headnotes you are always starting with unprotectible material. Where what is left is also quite brief and an idea about the law itself, originality at the individual headnote level is a different story. But at least Judge Bilbas did not grant summary judgment for verbatim paring. 

 One may also wonder how many of the hundreds of thousands of headnotes are touched by human hands versus being machine generated. There are a lot of attorney editors, but so so many opinions that come out every day. Far be it for me, though, to question my own publisher's originality. 

Then there is the key numbering system. Here is the entire discussion about it:

There is no genuine issue of material fact about the Key Number System's originality. Recall that Westlaw uses this taxonomy to organize its materials. Even if “most of the organization decisions are made by a rote computer program and the high-level topics largely track common doctrinal topics taught as law school courses,” it still has the minimum “spark” of originality. The question is whether the system is original, not how hard Thomas Reuters worked to create it. So whether a rote computer program did the work is not dispositive. And it does not matter if the Key Number System categorizes opinions into legal buckets that any first-year law student would recognize. To be original, a compilation need not be “novel,” just “independently created by” Thomson Reuters.  There are many possible, logical ways to organize legal topics by level of granularity. It is enough that Thomson Reuters chose a particular one.

17 USC 102(b) bars protection for "any idea, procedure, process, system, method of operation, concept, principle, or discovery, regardless of the form in which it is described, explained, illustrated, or embodied in [an original] work [of authorship]." And a system generated by a rote computer program also lacks human authorship. Judge Bilbas may be correct that there were many ways to organize the legal topics. Perhaps Westlaw's human authors made many choices, and those choices were the result of a sufficient amount of discretion to warrant protection, but if so, it would have been helpful to have those choices explained. The Stanford Law Library has an excellent explanation of the system, here. The massive nature of the system, and its evolution and refinement since its introduction over 100 years ago, pre-computers may well justify its originality, however, the case law on parts numbering systems beginning with then Judge Alito's en banc decision in Southco, Inc. v. Kanebridge Corp., 390 F.3d 276 (3d Cir. 2004) warranted a more detailed analysis. 

The point I am making with the above discussion on originality is that the unique facts of the works at issue render the decision of very little value in the other AI cases which involve traditional copyrighted works. 

On infringement, the February 11 opinion is exceedingly narrow. Thomson alleged infringement of 21,787 headnotes, the editorial decisions in 500 judicial opinions, and infringement of the key numbering system. Judge Bilbas declined to decide all those, ruling, “ I see no proof sufficient to take all these items away from a jury.” What did he decide was possible infringement of 3,384 headnotes. Of the 3,384 headnotes, he found infringement of 2,243. Judge Bilbas was careful: 

“I grant summary judgment only on the headnotes for which substantial similarity is so obvious that no reasonable jury could find otherwise. In practice, this means that I am granting summary judgment only on the headnotes whose language very closely tracks the language of the Bulk Memo question but not the language of the case opinion. The rest of the headnotes must go to trial.”

What about fair use, the subject of some of the breathless media articles?  Alas, the particular questions and headnotes in the case are sealed, for reasons that aren’t explained, rendering any discussion of fair use speculative. The use was deemed not transformative since it was for the same purpose as the original, and in competition with it. The discussion of AI was limited to this:

Ross was using Thomson Reuters's headnotes as AI data to create a legal research tool to compete with Westlaw. It is undisputed that Ross's AI is not generative AI (AI that writes new content itself). Rather, when a user enters a legal question, Ross spits back relevant judicial opinions that have already been written. That process resembles how Westlaw uses headnotes and key numbers to return a list of cases with fitting headnotes. …

But, as Ross argues, the headnotes do not appear as part of the final product that Ross put forward to consumers. The copying occurred at an intermediate step: Ross turned the headnotes into numerical data about the relationships among legal words to feed into its AI.

To justify this as fair use, Ross cited to the intermediate copying cases like Sony Connectix and Sega Accolade. Judge Bilbas rejected these precedents for reasons that are not readily apparent. The first reason he gave is that those cases involved computer programs. Judge Bilbas did not explain why this made a difference. One might think the difference was in Ross's favor since Westlaw headnotes are meant to succinctly distill ideas from the unprotected judicial opinions they cover.  The scope of their copyright, if any, is less than that for the computer programs at issue in Connectix and Sega.

The second reason given was Judge Bilbas's conclusion that the copying in those cases was necessary because the unprotectible ideas in the computer programs could only be accessed by reverse engineering. Fair enough, but might one also consider that the purpose of the use -- to copy unprotectible material to create an end product that was not substantially similar was also true for Ross? 

The direct competition and perceived market substitution seems to have decisively influenced Judge Bilbas as also seen in his discussion of the fourth fair use factor:

My prior opinion left this factor for the jury. I thought that “Ross's use might be transformative, creating a brand-new research platform that serves a different purpose than Westlaw.” If that were true, then Ross would not be a market substitute for Westlaw. Plus, I worried whether there was a relevant, genuine issue of material fact about whether Thomson Reuters would use its data to train AI tools or sell its headnotes as training data.  And I thought a jury ought to sort out “whether the public's interest is better served by protecting a creator or a copier.” 

 In hindsight, those concerns are unpersuasive. Even taking all facts in favor of Ross, it meant to compete with Westlaw by developing a market substitute.  And it does not matter whether Thomson Reuters has used the data to train its own legal search tools; the effect on a potential market for AI training data is enough. Ross bears the burden of proof. It has not put forward enough facts to show that these markets do not exist and would not be affected.

In the end then, for Judge Bilbas, the fair use issue was one of non-transformative copying for directly competitive purposes.  The facts were unique, and the opinion only partially resolved the dispute. We will have to wait much longer and for other cases if we want to start talking about how LLM training is or is not fair use as a general matter. 




Comments

Popular posts from this blog

How Not To Read A Statute: Termination Rights

Beavers, Gas, and Maybe Beer in Texas