Beyond Stare Decisis: An AI Preparedness Framework For The Judiciary - I

websitenlsir
Apr 25
16 min read

Updated: Apr 26

Hemanth Bharatha Chakravarthy*

Abstract

To stand by and not unsettle the established is a fundamental tradition of common law jurisprudence. However, our heterodox judicial systems are ill-prepared to handle the interventions of rapidly improving technologies, especially artificial intelligence. We lack transhumanist imagination or preparedness. Alarmingly, over 82% of confidential legal data is inadvertently processed by AI, albeit covertly through personal tools used by employees. But abstinence is not a viable strategy: judiciaries must develop an etiquette for AI rather than seek prohibition, so that they may participate in and influence change rather than be swept by it.

The judiciary has much to gain from radical technological advancements. Early adoption should be a major focus. This involves redistributing institutional policies, resources, and technical competence toward technological efforts. From filing systems to case docket briefing tools for judges, there are valuable interventions that can be made across various facets of the litigation process.

But this institutional optimism must be steady and cautious. To frame this transition, we need an ontology of judicial AI that places a spectrum from simple administrative bookkeeping to producing judgments. Courts perform various roles like record keeping, cause listing, managing case journeys, publishing judgments, communicating with the public, training, material production, and maintaining infrastructure. Where or when in the court does the critical application of legal mind, emotional intuition, and sociopolitical context occur? What are the inputs to a judge, and what are the outputs? We must move beyond outdated concerns about bias and instead focus on how we can preserve agency within the judicial process. AI is a double-edged sword of standardisation—it yields clarity, predictability, and speed in the rule of law. But it may also stagnate thought toward Western jurisdiction, erode local nuance, or discourage subversive disruptions.

Preparedness involves being ready for these questions and participating in creating etiquettes and safety mechanisms for new technologies. So, we must be cautious about how we choose AI systems and train human users to develop the necessary skills to verify AI outputs and audit logs. Judicial procurement must urgently adopt a technical know-how for comparing and procuring AI systems. Proactive procurement agendas must focus on technical competence, robust benchmarking, and demonstrable real-world performance over cost-cutting or legacy vendor inertia. First-mover adoption, backed by rigorous human-in-the-loop testing and transparent AI audit logs, will ensure that AI enters the judiciary in a purposeful, observable, and technically fit manner. Courts must define where they must preserve the irreplaceable human elements of judgment—empathy, subversion, contextual interpretation. Everywhere else, they must lead by example and by experiment.

This piece will be published in two parts. Part I lays out the conceptual groundwork—mapping AI’s roles in courts and proposing a three-tiered ontology of Judicial AI. Part II builds on this framework to introduce an updated risk analysis framework and lays out a comprehensive preparedness agenda for enterprise-scale AI adoption in judiciaries.

PART I

I enjoy retelling this somewhat crass anecdote from a Scott Alexander blog:

“A machine learning researcher writes me in response to yesterday’s post, saying:

“I still think GPT-2 is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.”

I resisted the urge to answer “Yeah, well, your mom is a brute-force statistical pattern matcher which blends up the internet and gives you back a slightly unappetizing slurry of it when asked.”

But I think it would have been true.”

Posted on February 19, 2019, in response to the release of GPT-2, this comment appears precocious. His underlying claim is that humans develop general intelligence by observing lots of raw stimuli and data, without being trained via some expert framework or being born with sophisticated knowledge. This metaphor has found great success in deep learning, where increasing the size of the model leads to improved performance. Its been ten years since large-N pattern detectors outclassed chess grandmasters, go champions, and earlier Ais playing as per human opening theories and midgame tactics.[1] I found Scott reiterating a similar idea recently in response to a question from Dwarkesh Patel as to why AIs do not find valuable connections in the span of human knowledge that humans have yet to uncover[2]—

“Humans also aren't logically omniscient.

My favorite example of this is etymology. Did you know that "vacation" comes from literally vacating the cities? Or that a celebrity is a person who is celebrated? Or that "dream" and "trauma" come from the same root? These are all kind of obvious when you think about them, but I never noticed before reading etymology sites.

I think you don't make these connections until you have both concepts in attention at the same time, and the combinatorial explosion there means you've got to go at the same slow rate as all previous progress.”

Earlier popular imaginations assumed that computer scientists and artists would be the last remaining professions, as the former would program the AIs that replace human jobs, while the latter would retain the unreachable vitality of human experience. When this fantasy imploded, with programming and the arts becoming precisely the first fields to be most convincingly modeled, the myopic backlash is to ask why poetry is automated before household chores. Our early fantasy of slave robots was destined to fail, and I often point to Gödel, Escher, Bach for why this is obvious in retrospect. The book discusses how the same axiomatic logic of self-reference, isomorphism, and organisation underpins Gödel’s mathematics, Escher’s paintings, and Bach’s symphonies. The linear algebra of our models yields an organisation of information from which beauty, knowledge, and meaning emerge. Communities that have internalised this fascination—such as transhumanist and rationalist communities in California since the 1990s—are often where significant forecasts and advancements in AI have emerged. And conversely, lacking this humility and imagination makes us vulnerable to being subjects of change.

Degrowth, safetyist, romantic, and luddite movements are among the few other groups with sufficiently large imaginations about the impact of general intelligence. These communities have called for autocratic pauses in AI development and use. But AI is a technology that disseminates immense amounts of information to diverse audiences in various locations, much like the internet or the printing press. And the challenge with democratic technologies is that they tend to be robust to repression.

We already see robustness to repression: employees smuggle in AI into workplaces that prohibit them and young law graduates are making career choices based on firms’ AI policies. A 2024 study involving 6,000 knowledge workers revealed that 75% covertly used AI at work and 46% would “not give it up” even if it were banned. A large 2023 Salesforce survey found that more than 64% of Indian professionals used unauthorised personal AI of unknown provenance. Meanwhile, in the UK, a LexisNexis survey found that 19% of young lawyers in “big law” are considering leaving their firm because of its “failure to embrace AI.”

This is only the beginning—a recent Anthropic index reports that programmers and computer engineers are more than ten times overrepresented among Claude users vis a vis their count in the worker pool. This will change with generational turnover in the legal field: programmers are early adopters of new technologies, but others eventually get there. The adoption of AI in knowledge workplaces seems inevitable when we inspect the factors driving employee productivity, which managers seek, and satisfaction, which employees themselves seek. It is already fairly well documented that AI makes work easier and improves quality (among others, see this McKinsey note, the early BCG study, and this MIT working paper). But most prescient to me is this early survey of GitHub Copilot users. Copilot is a coding assistant that automatically suggests the next paragraphs of a program, and users press the tab key to accept. 60-75% of surveyed users reported being less frustrated and more satisfied, which is evident as around half of the user-generated lines of code are being generated by AI.

Courts cannot afford to be taken by surprise by emergent general intelligence. For me, this is the beginning of a reckoning of the human role in institutions and society. I think it is not a relegation but a promotion for humans to focus more on areas of subjective interpretation, intuition, and communication. These are areas where we might care about the inventiveness, subversion, and empathy of the human psyche. AI systems will have an impact that is vastly larger than that of the internet, the industrial revolution, or any other discontinuity in technological progress. Current model failure modes like hallucination, long context failures, and snowballing are already proving somewhat “treatable.” So what do we do then: do we simply wait for the inevitable end of history to engulf us? Rather than being pessimistic or alarmist, judiciaries should consider how to prepare for events before they occur, how to influence them to occur in a manner aligned with our values, and what it means to have a transhumanist court. I propose an ontology of law, justice, and AI, updating our risk framework and proposing a set of etiquettes to be prepared for artificial legal intelligence.

A Roadmap

First, we will explore the legal process and its constituents. I will correspond this taxonomy of legal process to AI modalities, establishing why autoregressive language models do so well at legal work and envisioning human-AI collaboration in the legal field. With these cartoon-strip imaginations of medium-term futures, we will attempt to detangle the many things that AI in the judiciary means. A courthouse is many things that are not court halls—they are also the registry and the filing room, and they record inputs and enforce outcomes. AI in the judiciary means AI for all of the functions of the courthouse, including its bookkeeping and procedural functions, and its public access and communication functions. And even when it comes to the application of legal mind inside the court hall, the application of the lawyer, the judge, and the judge’s clerk is different. Counsels submit on behalf of parties, clerks produce a neutral record and impartial scrutiny, and judges rule on submissions. So, by delineating legal functions, we can identify which of them must be reserved for humans—which must be self-determined.

For the legal functions that we agree the court should pursue thinking machines for, it is important to develop a know-how for calibrating risk and regulating etiquette. This risk assessment must go beyond simplistic prediction bias. When predictive machine learning models have attempted to predict outcomes like sentencing, they have been noted to reproduce ascriptive biases like racial bias—see Sendhil Mullainathan and his lab’s work, for example, in Kleinberg et al., finding racial bias in criminal justice models. But Sendhil himself notes that biased algorithms are easier to detect and fix than the biases present in humans.[3] Further, it is easy to eliminate predictive systems from subjectively determined areas of the judiciary, like sentencing, just as it is easy to benchmark, test, and reduce explicit bias. We need a risk framework for judicial AI for the post-RLHF LLM era[4] (see Smith et al. 2018, this report of real-world deceit by GPT-4, and Greenblat et al. 2024 and Liu et al. 2023 on alignment faking in models). And the judiciary and the state will be able to influence the economic and social regulation of models only if they have stakes to do so—if they devote themselves as institutions to develop, procure, and adopt novel technologies in open court.

Anxious judiciaries should vest their anxiety in encouraging AI usage that is fit-for-purpose, intentional, and observable. OpenAI’s count of weekly active users has already surpassed 400 million. And this is just the beginning. Judiciaries and other government agencies will not be able to influence AI, nor will they have the etiquettes ready for it unless they become early adopters of state-of-the-art technology.

A Thesis Enumerating “Legal Practice”

“Legal practice” is an application of a larger corpus to a smaller corpus, a reading and writing comprehension exercise in stylised notation with axiomatic logic. This makes me very excited as an AI engineer because models that use text as a path to knowledge should naturally align with legal processes. For example, in litigation, one composes the history of common law precedent against the facts and issues of one’s matter. In contract review, one adapts the templates and prior contracts of a company for a new deal. In my view, legal tasks often fall into two categories. Needle-in-a-haystack retrieval tasks involve finding specific information in a large body of text. This could mean finding red flags in a large data room during due diligence or identifying precedents to argue in a hearing. The second category of legal tasks is patterned inference—generating responses or negotiating positions within a familiar and repeated structure. For instance, an in-house counsel at a software product company selling three different products operates within a well-understood search space when negotiating contracts. The same issues—liability, payment terms, or indemnity clauses—arise consistently, and deviations from the template require executive approval for fallbacks.

In my experience, language models are excellent at both of these classes of tasks. The first involves ingesting fifteen million case laws or thousands of prior contracts to produce the right citation. This class of tasks involves semantic search, retrieval, extraction, and information compression—these also happen to be the buzzwords of the day in recent AI developments. Similarly, language models train and develop reasoning abilities by identifying patterns in text and learning to reproduce these patterns in their predictions of new tokens. These might be simple patterns like syntax and grammar, or complex patterns like statutory interpretation and due diligence.

My outside view on legal process is that some processes, like executing a contract, filing compliance documents, or trivialities like witness signatures, are a performance. When words are said in court, language functions as an action and brings about change. In more procedural settings, the legal process is an end and a means by itself—recording is an act of insuring against trust failures that cause disputes over fact. But, in the drawing up and execution of a contract, the commercially valuable activity occurs and concludes before the execution of the contract. It is the terms of the handshake that matter, and the contract is a record. The legal action does not contribute to the value of the economic activity being conducted, and the contract is fulfilled by the conduct of a subsequent economic activity. Yet, in writing the terms of such activity down, the performative achieves something significant. The performance is an end in and of itself, rather than a means to an end.

This split between the actual and the performative yields a native dichotomy for human-AI collaboration in legal practice. Tasks that involve making predictions might be performed by humans (still using AI as research tools), whereas performatives like writing down the handshake could increasingly be augmented by AI systems. Humans can perform legal work as a means to an end, while computer systems can perform tasks such as documenting agreements and filing compliance reports.

An Ontology Of Judicial AI

AI in the judiciary can mean very different things, and their function is a useful way to grapple with their role in the human-AI alliance. I propose a three-task taxonomy: first—judicial AI that is involved in the application of legal mind towards judicial outcomes, our protagonist; second—access technologies for communicating and documenting the law; and third—administrative technologies for registrar and courthouse work.

Technologies Involving The Application Of Legal Mind To Resolve Outcomes

When asked to imagine judicial AI, most people think about AIs that predict litigation outcomes or mediate settlements via ODR. This is the extreme in the spectrum of tools that think and analyse legalese rather than help administer. When we strip this agency away from AI systems but retain their knowledge and processing power, we get paralegal or clerk job aids that help judges quickly parse and navigate their dockets and hearings. While frontier products are emerging in the former category of AI assistants (for example, jhana’s AI Paralegal, see Fig. 1), tools in the other extreme of outcome prediction can be and have been regulated away. One example is France which was early to ban analytics and predictions produced by modeling judges’ behavior in 2019. Similarly, we can imagine regulating away models that predict the ratio in cases.

Fig. 1: A sample workflow for an AI agent conducting litigation research, from jhana feature page for AI Paralegal (see note 15). — *Fig. 1: A sample workflow for an AI agent conducting litigation research, from jhana* *feature page* *for AI Paralegal (see note 15).*

Imagine further that the model could tabulate various facts for motor accident cases like age, income, insurance status, whether there was drunken driving involved, etc. Now, based on the judge’s sentence, such a model could look up the relevant reference for a formula. For instance, in Sarla Verma, the Supreme Court laid down a framework for computing the compensation to be awarded. This computation could be returned to the judge alongside an audit trail and footnotes. This is an example of AI performing an administrative task—a lookup exercise and an Excel sheet calculation—but the terrain of this performance is in the delivery of justice.

Similarly, there are other ways in which systems that process legal analysis can be helpful whilst preserving human autonomy, where relevant. Imagine a pre-litigation guidance AI that computes analytics like expected process time and cost for pursuing litigation. This addresses the significant issue of court congestion: In 2021, 40 million cases were pending in India, with Bloomberg commenting audaciously that “if the nation’s judges took no breaks for eating or sleeping and closed 100 cases per hour, it would take more than 35 years to catch up”. Delay creates a self-perpetuating strain on the judiciary. Among other numbers, the average pendency for a land dispute stands at 20 years and around Rs. 1.75 lakh crore of infrastructure projects remain paused in litigation. Moreover, court delays cost Rs. 5 of every 100 rupees of income earned in India.

Generally, adversarial justice systems are performant because those who are truly aggrieved are more likely to litigate, and thus court loads are typically for matters of good reason. However, the issue is that those who do not expect victory do not choose not to litigate. This is because Indian courts do not pass orders as to costs, despite being originally based on the English rule. This misplaced idea of leniency has a large potential for manichaeic backfiring. The Law Commission of India, in its 240th Report, demonstrates how courts trivially ignore cost calculations, and even when passing orders for costs, do so at outlandishly low sizes. This is despite a sizeable 31% of the claim value in commercial cases being typically spent on costs. When legal costs, timelines, and expected settlements are opaque and unavailable a priori to parties, there is little incentive to pursue alternate dispute resolution or settlement. So, a softer judicial AI could focus on providing pre-litigation screening, counselling, and triaging—enabling courts to be more confident to pass orders as to costs when litigation does arise after high-information entry points.

If we are aware of what the unique and irreplaceable contributions of humans are in our perspectives, we can keep those realms of subjective evaluation, contextual interpretation, and emotional empathy reserved for the human psyche. Besides, while AI capabilities continue to grow, we do not yet have general intelligence with independence, memory, agency, and wakefulness. Instead, our AIs work ad-hoc upon call and invoke chains of functions with limited parameters and schema. Until new capabilities over long-form thinking and multi-step reasoning arise, current models are better suited as copilots rather than solo pilots anyway. And they are ready for deployment as clerks and paralegals.

Technologies That Record And Communicate

For the second category of tasks, “access,” consider a digital reporter of a High Court. To truly democratise information, we must go beyond merely digitising or archiving public records. Tools like indexing, searchability, and summarisation are some of the things that make public information more realised—more public. So, returning to our analog of an HC digital reporter, a starting point is to ensure that case law is well tabulated (in terms of metadata), made searchable, and kept up to date. While paid proprietary journals offer headnotes and citations, a neutral citation regime offers a freely available index to reference. Current AI systems are already good at tasks like compressing information, interpreting hypotheses, and stylised writing. Consequently, AI products already produce high-quality headnotes and other tags or summaries for the eCourts reporters that High Courts are mandated and working to launch.

AI headnote models have several competitive advantages. Compute is parallelizable—so, bulk historic case law can be analysed simultaneously in minutes to hours. New judgments can be reported at a daily or weekly pace, outperforming the frequency of proprietary reporters. Given the implicit sunk cost of inference from the input tokens of a judgment being processed to produce any given set of N thematic abstractions (“themes” like a subsection of a headnote for a particular statute, a list of questions of law, or a natural language summary of facts), adding the N+1th theme is marginally inexpensive. The logic for this is trivially obvious: if we are producing 40-50 phrases of summary from a document of 100,000 words, adding another 5 phrases to our output is inexpensive as a total token cost. So, jurisdictions can customise their journals, produce them at a higher frequency, and make them searchable and indexed for lawyers and clients to access.

In my experience, practitioners typically forego concerns about AI use for disseminating judicial information when high-quality outputs are shown. It brings up the self-driving car dilemma: computer-driven cars will make mistakes, but at what threshold of outperforming average human performance do we embrace them? I believe that state-of-the-art models will outperform human-written journals at a fraction of potential comparative costs. And access technologies like these are low risk to tolerate given that the AI presence occurs post hoc to the disposal of dispute or administrative action. Public intelligibility and transparency of judicial activity is a substantial intervention in rule of law and is probably a worthwhile cause to give AI a chance.

Technologies That Administrate Outside The Court Hall

The third category belongs to “administrative” technologies—one way to imagine this is AI interventions in activities that happen outside the courtroom. For instance, in the roster or in the registry. Administrative AI focuses primarily on improving judicial efficiency through workflow automation, docket management, resource allocation, and logistical planning. Imagine an AI model allowed to read, categorise, and tag the typesets from pending matters. Now, this model could bunch case law by issue and produce a listing such that, say, all motor accident cases of a certain kind could be heard on a certain day and so on. This would allow judges to hear and dispose of similar matters in bulk. There is already a demand for this feature, with human editors annotating case categories for both regular listing and group disposal efforts.

Another example includes technologies that automate clerical tasks like document filing, format validation, paperwork management, stamp duty calculation, copy maintenance, and managing court calendars. By integrating administrative AI, courts can streamline filings and other document-related operations, reduce human error, and reallocate human resources to higher-order judicial tasks. This would save hundreds and thousands of human hours in courts. These administrative AI uses are typically non-controversial due to their non-judicial nature and clear, measurable efficiency gains. They free judicial personnel from rote tasks, enabling judges and court officers to focus more profoundly on delivering justice rather than administrative overhead.

Meanwhile, we now have a taxonomy of AI in the judiciary. We have also built imaginations and case studies for what it might look like in action. We can proceed to think about the legal process as a performative, why AI is a native solution for its conduct, and how to safely bridge this transition.

*Hemanth Bharatha Chakravarthy is the Co-Founder & CEO of jhana.ai, which produces datasets, agents & interfaces that make legal research, drafting & doc review faster. He trained in applied mathematics and economics at Harvard College and has a background in quantitative research, policy, and government.

[1] See the IBM Deep Blue victory over Kasparov in May 1997 and the Deepmind AlphaGo victory over Lee Sedol in October 2015.

[2] And see Dwarkesh’s response here.

[3] “By contrast, uncovering algorithmic discrimination was far more straightforward. This was a statistical exercise—the equivalent of asking the algorithm “what would you do with this patient?” hundreds of thousands of times, and mapping out the racial differences…Humans are inscrutable in a way that algorithms are not…”

[4] Reinforcement Learning from Human Feedback, see Lambert et al. (HuggingFace blog, 2022).

National Law School of India Review

India's oldest student-edited law review

NLSIR

|

Online

Beyond Stare Decisis: An AI Preparedness Framework For The Judiciary - I

Recent Posts

Comments