Can AI Think Like a Judge?

For over ten years, computer scientist Randy Goebel and his colleagues in Japan have been quietly conducting one of the most revealing experiments in artificial intelligence —a legal reasoning competition based on the Japanese bar exam. The challenge is to have AI systems retrieve relevant laws and then answer the core question at the heart of every legal case of whether the law was broken or not. That yes/no decision, it turns out, is where AI stumbles hardest. This struggle has profound implications for how, and whether, AI can be ethically and effectively deployed in courtrooms, law offices, and judicial systems, which are under pressure to deliver justice quickly and fairly.[1]

A new paper, “LLMs for legal reasoning: A unified framework and future perspectives”, builds on this competition and outlines the types of reasoning AI must master to “think” like legal professionals. The accompanying article, “Is AI ready for the courtroom?”, explores the stakes and shortcomings of current AI tools, especially large language models (LLMs), in legal contexts. Together, they offer a roadmap for how AI might one day support, not replace, human judgment in law.

The Problem: Why Legal Reasoning Is So Hard for AI - Legal reasoning isn’t just about reading laws. It’s about interpreting them in context, weighing competing facts, and constructing plausible narratives. The authors of the paper tackle the fundamental problem of equipping AI systems with the ability to reason like lawyers and judges. That means moving beyond pattern recognition and text prediction to something deeper, logical inference, contextual understanding, and ethical judgment. Current LLMs can summarize documents and mimic legal language, but they often “hallucinate” facts or fail to connect legal principles to real-world scenarios. In high-stakes environments, such as courtrooms, these errors are particularly dangerous.

The Three Types of Reasoning AI Must Learn - To function effectively in legal settings, AI must master three distinct types of reasoning:

Case-Based Reasoning
Rule-Based Reasoning
Abductive Reasoning

Each plays a distinct role in how legal professionals approach problems.

Case-Based Reasoning: Learning from Past Decisions - This is stare decisis, the legal world’s version of “precedent.” Lawyers and judges often look at previous cases to see how similar situations were handled. AI systems using case-based reasoning compare the facts of a new case to past ones, identifying patterns and outcomes that might apply. It’s like saying, “In a similar case last year, the court ruled X, so that might apply here too.” LLMs are relatively good at this because they’ve been trained on massive datasets that include legal texts. However, they still struggle to determine which cases are most relevant or how subtle differences might impact the outcome.

Rule-Based Reasoning: Applying the Law to the Facts - This is the bread and butter of legal analysis. Rule-based reasoning involves applying written laws, such as statutes, regulations, and codes, to the specific facts of a case. For example, if the law says theft requires “intent to permanently deprive,” the AI must determine whether that intent existed in the case at hand. AI can handle this to a degree, especially when the rules are clear and the facts are straightforward. But real-world cases often involve ambiguity, conflicting rules, or exceptions that require human judgment.

Abductive Reasoning: Building Plausible Narratives - This is where AI tends to falter most. Abductive reasoning is about constructing the most plausible explanation for a set of facts. It’s the kind of thinking that asks, “What could have happened here?” and then builds a narrative that fits the evidence. In legal terms, it’s the difference between saying, “The man had a knife,” and asking, “Did he stab the victim, or did something else happen?” This kind of reasoning requires imagination, context, and a sense of plausibility, qualities that current LLMs lack.

What the Researchers Did - Goebel and his team designed a framework that categorizes legal reasoning into three types and tested AI systems against legal problems drawn from the Japanese bar exam. These problems are complex, realistic, and require nuanced judgment. The researchers didn’t just look at whether the AI could retrieve laws, they wanted to know if it could reason through them.

The results were sobering. While AI systems could handle case-based and rule-based reasoning to some extent, they consistently failed at abductive reasoning. They couldn’t build coherent narratives or explain why a particular outcome made sense. Worse, they often invented facts or misapplied laws, making them unreliable in legal contexts.

This finding highlights the crucial point that AI is not yet capable of making independent legal decisions. But it also highlights a path forward. By developing specialized reasoning frameworks and combining them with LLMs, researchers may be able to build tools that support human decision-making rather than replace it.

Advancing AI in Legal and Judicial Systems - The framework proposed by Goebel’s team could help create modular AI tools tailored to specific legal tasks like retrieving statutes, summarizing cases, and identifying relevant precedents, without pretending to offer perfect judgment. This approach respects the complexity of law and the ethical stakes involved. Rather than chasing a “godlike” AI that can do everything, the researchers advocate for a toolbox of specialized systems, each designed to assist with a particular aspect of legal work. That’s a more realistic and responsible vision for AI in law.

If developed and deployed carefully, these tools could help legal professionals manage heavy caseloads, reduce delays, and improve access to justice. In countries like Canada, where the Supreme Court’s Jordan decision has shortened trial timelines, such tools could prevent serious cases from being thrown out due to procedural delays. But the risks are real. Misapplied AI could lead to wrongful convictions, biased outcomes, or erosion of public trust. That’s why transparency, oversight, and human judgment must remain central.

The Way Ahead - The next phase of research will likely focus on integrating these reasoning frameworks into real-world legal workflows. That means testing AI tools in live environments, gathering feedback from judges and lawyers, and refining the systems to ensure accuracy and reliability. It also means confronting ethical questions head-on. Who is responsible when AI gets it wrong? How do we ensure fairness and accountability? And how do we balance efficiency with justice?

The path forward isn’t about replacing lawyers; it’s about augmenting their capabilities with tools that respect the complexity of legal reasoning. Goebel’s work offers a blueprint for how to do that, one careful step at a time.

This article is shared with permission at no charge for educational and informational purposes only.

Red Sky Alliance is a Cyber Threat Analysis and Intelligence Service organization. We provide indicators of compromise information via a notification service (RedXray) or an analysis service (CTAC). For questions, comments, or assistance, please contact the office directly at 1-844-492-7225 or feedback@redskyalliance.com

Reporting: https://www.redskyalliance.org/
Website: https://www.redskyalliance.com/
LinkedIn: https://www.linkedin.com/company/64265941

Weekly Cyber Intelligence Briefings:
REDSHORTS - Weekly Cyber Intelligence Briefings
https://register.gotowebinar.com/register/5207428251321676122

[1] https://six3ro.substack.com/p/can-ai-think-like-a-judge-a-new-framework

X-Industry

Can AI Think Like a Judge?

Comments