An article on natural language searches raises troubling questions

By Mark Ritchie

“We become what we behold. We shape our tools and then our tools shape us.”
– Marshall McLuhan

We have seen some truly remarkable technological advances over the past quarter-century. The cellular phone has evolved from basic communication device to a pocket-sized personal computer, and the hefty PC running MS-DOS has been supplanted by sleek tablets and improbably-thin laptops. Gone are screeching modems and stand-alone fax machines, curious relics of our technological past bearing little resemblance to the modern high-speed internet connections and inexpensive all-in-one printers available everywhere. The capabilities of both software and hardware have increased exponentially since the 1990s, and anyone who had not attempted to perform natural language legal research since then could be forgiven for expecting a similar geometric progression in the quality of results returned for a given search . . . .

When Lexis and Westlaw first rolled out their natural language search engines roughly two decades ago, their representatives touted it as a grand step forward. Those who took legal research more seriously greeted the representatives’ claims with skepticism, and rightly so, given that results returned by these searches left much to be desired. While some cases returned by a search would be relevant, just as many if not more bore little or no discernible relationship to the issues actually being researched, and frequently the results included cases that obviously were no longer good law.

Nearly 20 years later, users still describe natural language legal research as a consistently disappointing experience, whether the research is done through WestlawNext, Lexis Advance, or another provider.1 But while natural language searches are easily derided based on anecdotal experience, until recently no concerted effort had been made to meaningfully benchmark the reliability of their underlying algorithms. This is soon to change, though, thanks to the dedication of Susan Nevelow Mart, an associate professor and director of the law library at the University of Colorado Law School.
In October 2017, the Law Library Journal published Professor Nevelow Mart’s article, “The Algorithm as a Human Artifact: Implications for Legal {Re}Search,”2
in which she evaluates results returned by the natural language search algorithms of Westlaw, Lexis Advance, Fastcase, Google Scholar, Ravel and Casetext. While the article does praise (perhaps faintly) the variation in results between the different providers studied as “a remarkable testament to the variability of human problem solving,” the results themselves are nothing short of disturbing:

There is hardly any overlap in the cases that appear in the top ten results returned by each database. An average of 40 percent of the cases were unique to one database, and only about seven percent of the cases were returned in search results in all six databases. . . . The oldest database providers, Westlaw and Lexis, were at the top in terms of relevance, with 67 percent and 57 percent relevant results, respectively. The newer legal database providers, Fastcase, Google Scholar, Casetext, and Ravel, were clustered together at a lower relevance rate, each returning about 40 percent relevant results.3

We as appellate practitioners may be tempted to dismiss these results as unimportant (at least so long as we have the option of resorting to more reliable tools for research), but bear in mind that natural language searches are marketed preferentially by their providers, presumably based on the belief that natural language is what the average customer wants.4 Professor Nevelow Mart makes the case in her article that we must demand “algorithmic accountability” from those responsible for crafting these natural language algorithms, positing that only then will educators and researchers be in a position to outmaneuver each algorithms’ inherent biases.5 I certainly agree with this proposition, but I also think that the lack of algorithmic accountability is disturbing because these algorithms stand to exert at least a subtle influence on how the law develops over time. The demonstrable biases embodied in each database’s search algorithms make particular cases, articles, etc. relatively easier or more difficult to locate, and the limited time and resources most lawyers can devote to research any given issue means the influence of some authorities will be elevated, and some will be diminished, based on the biases and assumptions of those who programmed the algorithms in the first place.6 Without necessarily meaning to do so, the designers of the algorithms we use to find the law become active, uninvited, and presumptively unwelcome participants in the process of shaping the law itself.

What, then, is to be done? Professor Nevelow Mart makes an eloquent and convincing case for demanding that the database providers allow at least a peek inside the “black box” of their search algorithms, but nothing will change until there is significant awareness of the problem. I would say that reading Professor Nevelow Mart’s article,7 then encouraging friends and colleagues to do the same, is a good start. We, as appellate lawyers, certainly have a selfish interest in demanding better tools for legal research, but more importantly we also have an obligation to look after the law itself. The law should be protected from degradation and inappropriate influence, and we can advance this cause by joining the movement to demand algorithmic accountability.

1. As observed by one individual shortly before Westlaw Classic was retired, WestlawNext’s “search results were just bizarre to me. It was more akin to the anti-Google – I’d type in search terms or even a case name, and I’d get everything other than the case or article I was looking for.” Matt Bodie, I want my Westlaw Classic, PRAWFSBLAWG (Apr. 16, 2014, 12:09 P.M.). Others commenting in response to this thread pointed out that Lexis Advance was every bit as unpleasant an experience. The consensus among those who devote a substantial portion of their time to legal research tends to be that natural language searches are useful as a tool for achieving rapid, if superficial, familiarity with an unfamiliar area of law. See, e.g., Dorie Bertram, Searching Bloomberg Law, Lexis Advance and Westlaw: Natural Language v. Terms & Connectors Searching, WASH. U. LAW LIBRARY RESEARCH GUIDES, (last updated Aug. 28, 2017) (noting that a natural language search is best used “as a starting point for finding a few highly relevant documents,” but that “[i]t doesn’t always find all results or even the best results”); see also Bodie, supra (noting that natural language searches were useful “to skim the surface of a topic”).

2. 109 Law Libr. J. 387 (2017).

3. Id. at 390.

4. See Did Lexis Squander $US700,000,000 On Lexis Advance, PRACTICE SOURCE, (reprinting a document from an anonymous source critical of those responsible for designing and marketing Lexis Advance based on customer input rather than advice from legal researcher experts).

5. “Algorithmic accountability in legal databases will help assure researchers of the reliability of their search results . . . . If researchers know generally what a search algorithm is privileging in its results, they will be better researchers.” Nevelow Mart, supra note v, at 4.

6. Indeed, this problem is only magnified by marketing of natural language searches as a ready solution to the pressures of practice. Advertising pieces frequently imply that natural language searches are a better approach than traditional Boolean searches, delivering equivalent quality results with greater ease and efficiency. See, e.g., Making Boolean Researchers Even More Effective (favorably comparing results from a WestlawNext search to a traditional Boolean search in Westlaw, while also noting that “studies show researchers who use WestlawNext are 64% more efficient than researchers who use”).

7. [I’m confident that everyone with the patience to have read my article this far is also entirely capable of tackling Professor Nevelow Mart’s article, but she has also published a much shorter article on the same subject for those preferring a less-ambitious reading assignment. Susan Nevelow Mart, Every Algorithm Has a POV, AALL SPECTRUM (Sept./Oct. 2017)]