Industry-first evaluation of legal AI reveal tools excel at routine tasks but fall short in reasoning-intensive scenarios, highlighting promise and limitations
San Francisco, CA — February 27, 2025 — Vals AI, a leading San Francisco-based AI evaluation platform, today released the results of the first independent benchmarking study of legal AI tools, revealing groundbreaking insights that redefines the frontier of AI’s role in the legal industry. The study, conducted in collaboration with Legaltech Hub, evaluated leading legal AI tools (Harvey, Thomson Reuters’ CoCounsel, vLex, LexisNexis, and VecFlow) to both analyze how AI tools deliver value in legal contexts, and to set a roadmap for the industry’s future.
“Generative AI has reshaped the legal landscape, but not all tools are created equal,” explained Rayan Krishnan, co-founder of Vals AI.“Our study not only measures performance but also establishes first-ever standards that legal professionals and developers can rely on to understand the technology’s impact, but most importantly, its limitations.”
Beyond just technical performance, the findings underscore the implications of AI on operational efficiency, vendor decision-making, and the evolving role of lawyers. What the report makes clear is that AI is outperforming lawyers—but only in the easy cases. AI tools consistently outperformed human lawyers in basic tasks like document extraction, summarization, and transcript analysis, but fell short in complex and reasoning-intensive tasks. This underscores a clear delta between current capabilities and ongoing discourse around generative AI making lawyers obsolete. Additional key findings from the report include:
- Transforming the Legal Job Market: The study illustrates how firms can augment – not replace – their workforce by leveraging AI for repetitive, resource-intensive tasks, enabling legal teams to focus on strategic, high-value work.
- The Frontier Is Clear—But We’re Not There Yet: The study dispels the myth of AI omnipotence in legal contexts. While AI tools deliver substantial value in high-volume, repetitive tasks, they are not yet reliable stand-ins for lawyers in nuanced legal reasoning.
- Not All Intelligence Is Created Equal: The benchmarking results revealed significant variability among AI applications, challenging the notion of AI as a monolithic solution. Firms cannot afford to rely on surface-level claims; rigorous evaluation is essential.
- A Need for Rigorous Evaluation: Variability among platforms highlights the need for firms to move beyond marketing claims and adopt a data-driven approach to AI investments.
The study also highlights a critical truth: AI isn’t a panacea—it’s a tool with measurable strengths and limitations. Companies that invest strategically, rather than chasing hype or avoiding adoption altogether, will be the ones to reap the most significant rewards.
“These results offer a balanced perspective for the legal community,” said Langston, Co-Founder of Vals AI. “For developers, it’s a roadmap to prioritize innovation in underperforming areas. For law firms, it’s a guide to making strategic investments in AI that enhance both client service and operational ROI.”
The study assessed five leading AI platforms – Thomson Reuters’ CoCounsel, LexisNexis, Harvey, vLex, and VecFlow—on tasks including document summarization, EDGAR research, question-answering, as well as the key functions of these tools that can have a profound impact on legal outcomes.
Some key takeaways include:
- Harvey opted into six out of seven tasks. They received top scores of AI tools on five tasks, second place on one task, and outperformed the lawyer baseline on four tasks.
- CoCounsel is the only other vendor whose AI tool received a top score. It was consistently one of the top performing tools for the four tasks it was evaluated on (ranging from 73.2% to 89.6%).
- The Lawyer Baseline performed better than all AI tools on two tasks and matched the best-performing tool in one. At least one AI tool surpassed the Lawyer Baseline in the four remaining tasks.
The report also signals a broader opportunity to apply benchmarking standards to other professional-grade AI tools, setting the stage for future research into industries like finance and insurance. Vals AI aims to lead the charge in fostering transparency and trust as AI adoption accelerates and permeates.
A full copy of the report can be viewed here: https://vals.ai/vlair
About Vals AI
Vals AI is the leader in benchmarking large language models (LLMs) for enterprise tasks. Its proprietary auto-evaluation framework enables organizations to objectively assess AI performance on complex, real-world challenges. By fostering transparency and trust, Vals AI helps companies in the legal, finance, and insurance domains adopt AI solutions confidently.
Media Contact:
Rayan Krishnan
CEO
rayan@vals.ai
650-521-6064