Industry Leading
eDiscovery Insight

Learn from renowned eDiscovery thought leaders

Learn More

Understanding Precision and Recall

Technology assisted review is a powerful tool for controlling review costs and workflows. But, to maximize the benefits of TAR, we must be able to understand the results.

Predictive coding has, for years, promised to reduce the time and expense of increasingly large scale litigation reviews. For attorneys and project managers assessing different methodologies, it has been challenging to understand what evaluative metrics are relevant. F-scores are often inappropriately interpreted as measures of review quality when evaluating predictive coding results. But to get a better understanding of how an application of predictive coding has performed and to manage the defensibility of your review, the component elements of the f-score – precision and recall – should be reviewed. But how do precision and recall scores relate? And, more importantly, what do these results tell you about your production?

In the context of TAR and predictive coding, precision is a measure of how often an algorithm accurately predicts a document to be responsive. In other words, what percentage of the produced documents are actually responsive. A low precision score tells us that there were many documents produced that were not actually responsive, potentially an indication of over-delivery. A high precision score on its own doesn’t mean much, either. One could deliver just 10 documents to opposing counsel, and if all 10 were responsive, we would have 100% precision but we would have almost certainly failed to deliver a very significant percentage of the responsive documents in the collection.

PerfectRecallPerfectPrecision To give our precision score any context relative to the over-riding goal of predictive coding — to quickly and defensibly deliver responsive documents to opposing counsel — we need to look at recall. Recall is a measure of what percentage of the responsive documents in a data set have been classified correctly by the TAR/predictive coding algorithm. When recall is 100%, the algorithm has correctly identified all of the responsive documents in a collection. A low recall score indicates that the algorithm has incorrectly marked responsive documents as non-responsive.

LowRecallHighPrecisionSquare To get an idea of how a predictive coding application has performed we need to look at precision and recall relative to each other. Due to the fundamental limitations of predictive coding technology, it would be very difficult to ever achieve perfect precision and recall on a collection. There is ultimately going to be a trade-off between optimizing the two measures. To improve precision, that is to reduce the proportion of false positives, we are likely going to reduce true positives — recall — as well. Similarly, to improve recall, or reduce the proportion of false negatives, we are likely going to increase the percentage of false positives and negatively affect precision. Because of this interrelation, much of what can be understood about TAR results is obscured by just looking at the f-score and accepting the result if it exceeds some arbitrary measure. Evaluating precision and recall in relation to each other tells a much more detailed story about TAR results.

HighRecallLowPrecisionSquare Given what we know about recall scores, it may occur that predictive coding actually gives us an explicit measure of how many responsive documents we didn’t deliver. How can we look at predictive coding results that indicate 80% recall and not be entirely focused on the 20% of responsive documents that haven’t been produced? The answer is that 80% recall may be a far better result than if a massively more expensive manual review of the documents was performed, instead. Though this seems controversial, it is a notion shared by The Sedona Conference, TREC legal track, and the judges who have been approving TAR use.

New TechnoLawyer Report – Group “Like” Discovery Documents to Expedite Your Review

A newly released TechnoLawyer Report, Group “Like” Discovery Documents to Expedite Your Review, discusses how Lexbe’s innovative NearDup Groupings+ technology can greatly enhance the quality of eDiscovery productions. NearDup Groupings+ from Lexbe gives litigators a powerful tool that can group similar documents, identify unfound ‘key’ or ‘hot’ documents, enable email threading, and prevent the inadvertent release of privileged case information, contributing to a fast, precise, and cost-efficient discovery review. TechnoLawyer states, “NearDup Groupings+ takes advantage of specialized servers in Lexbe’s datacenter, enabling it to scale to handle cases of any size.” and goes on to say that ‘NearDup Groupings+ also speeds up the review process while minimizing risks.’ Click here to read the full report and find out more about how NearDup Groupings+ services from Lexbe can help you win your case!

Controlling eDiscovery Costs

With eDiscovery becoming increasingly typical and financially burdensome, every litigation professional is looking to keep costs down while still delivering high quality document reviews. This search for low costs, at least, has remained constant. What has changed rapidly is the amount of Electronically Stored Information (ESI) subject to discovery. As the amount of ESI created through normal business activity grows, the need to keep eDiscovery costs down and leverage best of breed technologies grows correspondingly. Let’s take a look at this explosion in ESI volume and how it affects eDiscovery costs.

The amount of ESI collected from employees for commercial litigation has grown by 35% annually. A recent report by Microsoft Corporation found that the average collection of data per individual custodian involved in litigation increased from 7 GBs (~0.5 Million pages) in 2008 to 17.5 GBs in 2011 (~0.9 Million pages). This shows an astounding 150% increase in just three years (35% a year, compounded).

BlogPostControlCosts This ESI explosion has a direct effect on the costs associated with eDiscovery. The industry standard prices for processing services are falling but not nearly fast enough to keep up with the exponential growth of ESI collected. The cost to process one GB of raw ESI (~50,000 pages) in 2006 was $1,800. This cost declined to $500 by 2011, showing a 72% decrease in 5 years (22.6%, compounded).

The data demonstrates an annual compound growth of collected ESI of 35% and an annual decrease in processing costs of only 22.6%. With discoverable data growth outpacing cost decreases by 12.4% annually, controlling eDiscovery costs is increasingly crucial. Finding and selecting a quality eDiscovery provider that develops scalable, technology driven solutions that push back against typical cost drivers should be the focus of every litigation professional faced with an eDiscovery challenge.

TechnoLawyer Newswire Release -eDiscovery Processing That Scales With Your Needs

A new Technolawyer Newswire Report, eDiscovery Processing That Scales With Your Needs on Lexbe’s unprecedented processing speed has just been released! “Lexbe’s fast processing enables you to start your document reviews sooner, and meet tight discovery deadlines even as the volume of ESI continues to grow.” The report makes reference to our recently published White Paper, Redefining High Speed eDiscovery Processing & Production detailing a study of our unprecedented processing capacity. The study focused on processing the standardized 53 GB EDRM Enron data set to TIFF images. Lexbe eDiscovery Processing System (LEPS) was able to complete TIFF processing on the 53GB data set in only 5.3 hours! That is an industry leading TIFFing rate of approximately 10 GBs an hour or over 240 GBs each day! Long turnaround times and astronomical processing costs are no longer necessary evils of TIFF processing with Lexbe.

eDSG Poll Suggests Current eDiscovery Software is Too Expensive — We Agree!

An eDSG poll conducted back in April suggests that litigation professionals find current eDiscovery software too expensive, too slow, and they are dissapointed that the software doesn’t run in the cloud. We couldn’t agree more. Lexbe has focused on addressing these issues through an innovative operations architecture that takes advantage of the latest, highly scalable, and most secure computing technologies. The result is fast, affordable, cloud-based eDiscovery that gets the job done. Lexbe is easy to use, all features are included at no additional cost, and you can even get free native processing services when you host for 6 months. If you are also tired of slow, expensive, inefficient, and complicated eDiscovery software and services that are living in the past, learn more about how Lexbe eDiscovery Platform is changing the game by responding to these concerns.

Latest Blog

Subscribe to LexNotes

LexNotes is our monthly newsletter of eDiscovery and legal document management and review tips and best practices.