Evaluating GPT-4’s Impact on Legal Document Review

The emergence of generative artificial intelligence (AI), particularly OpenAI’s ChatGPT and its advanced version GPT-4, has sparked considerable interest in the legal community. Given AI’s potential to transform legal practices, including e-discovery and document review, it’s essential to assess whether GPT-4 can meet the high expectations set for it in the legal context. This article delves into an experimental assessment by Sidley, in collaboration with Relativity, to quantify GPT-4’s performance in legal document review.

ChatGPT and GPT-4: A Legal Tech Revolution?

Generative AI models like ChatGPT and GPT-4 have the capability to process vast amounts of textual information and generate original content, making them intriguing for legal practitioners. Their potential to alter and enhance legal practices, particularly in e-discovery, is significant. While there have been setbacks, such as the misuse of ChatGPT in legal briefs leading to sanctions, these do not define the technology’s full potential in the legal field.

The Sidley-Relativity Experiment: Testing GPT-4 in E-Discovery

Sidley’s experiment with GPT-4 aimed to understand its capability in coding documents for responsiveness in e-discovery. The study involved a sample of 1,500 documents from a closed case, including both responsive and non-responsive materials. GPT-4 was tasked with evaluating each document based on set review instructions, similar to those used by human reviewers.

Stage One: Initial Assessment

In the first stage, GPT-4 analyzed documents using the original review instructions given to attorneys. The AI’s performance was compared to the human review to establish a baseline.

Stage Two: Refined Prompts and Quality Control

Based on initial results, the prompts for GPT-4 were refined to address ambiguities, similar to a quality control feedback loop in human reviews. This adjustment significantly improved GPT-4’s performance, indicating its responsiveness to precise and detailed instructions.

Also Read: Casetext Leads AI Innovation in E-Discovery & Litigation

GPT-4’s Performance: Strengths and Limitations

GPT-4 demonstrated strong performance when confident about a document’s relevance, correctly tagging around 85% of documents as non-responsive and 84% as responsive. However, it was less accurate (64.5%) when less confident, highlighting the need for clear and comprehensive review instructions. GPT-4’s limitations were evident in dealing with documents part of a responsive family, short messages, and attachments. It analyzed documents in isolation, without considering context from related materials.

The Future of GPT-4 in Legal Practice

The future applications of GPT-4 in legal practice are promising. The experiment suggests that while GPT-4 brings consistency and efficiency, it operates differently from traditional Technology-Assisted Review (TAR) tools. Unlike TAR, which relies on human decisions on training documents, GPT-4 independently evaluates each document against the given prompt, offering more consistent coding and a streamlined QC process.

However, GPT-4’s current processing speed, approximately one document per second, is slower compared to TAR and needs improvement to handle larger document sets efficiently.

Conclusion: GPT-4’s Role in Legal E-Discovery

GPT-4’s experimental use in document review indicates a potential shift in traditional review workflows, promising greater consistency and efficiency in e-discovery. Its performance hinges on the quality and clarity of the prompts, underscoring the need for precise review instructions. While it may not completely replace human review, GPT-4 can significantly aid the process, particularly for documents where it shows high confidence. As AI technology continues to evolve, its integration into legal practices will likely expand, offering new avenues for legal professionals to enhance their workflow in e-discovery and beyond.