We answer key questions about the benefits of predictive coding in technology assisted review.
There is currently a question of methodology involving the review of documents in E-Disclosure matters. One methodology is using a keyword approach to relevant document identification. Another approach is using Predictive Coding. Both approaches above combine the use of technology with a manual review element. This note explains the benefits of using a Predictive Coding (PC) approach to a large-scale electronic disclosure exercise compared to the use of keywords.
Why should we use Predictive Coding?
Parties to a complex electronic disclosure exercise are encouraged by the Court’s rules to use technology assisted review in order to undertake a proportionate review.
Paragraph 9.6 of PD51U confirms that the parties should seek to reduce the burden and cost of the disclosure exercise and ways of doing this can include the use of:
“(3)(a) software or analytical tools, including technology assisted review software and techniques; and
(b) coding strategies, including to reduce duplication.”1
The explanatory note to Section 2 of the Disclosure Review Document also provides that parties should use a technology assisted review (TAR) to conduct a proportionate review of the data set (particularly if the review data set is likely to be in excess of 50,000 documents).2
The majority of case law and secondary sources on the use of PC originates from American law. However, the use of PC has been considered and approved by the English Court in Pyrrho v NWB3 (which relied on the U.S. precedent of Moore v Publicis Groupe4) and Brown v BCA trading.5
In contrast, keywords (and by that, I mean, Boolean search expressions) only find documents that exactly match the search term being used. Whether a document is relevant or not is a binary decision and the corpus of keywords does not automatically update as the understanding of the relevance criteria becomes better defined. Keywords will miss conceptually or contextually similar documents to the disclosure issues that do not contain the exact keywords. Therefore, the return set of documents captured by search terms is not only literal, it is effectively over inclusive. PC will continue to revise, refine, and improve its internal model and will look at those contextually and conceptually similar documents.
In practice, PC enables the review of a significantly lower number of documents than a set of documents retrieved by keywords alone. PC is not the machines taking over and making decisions on relevance for lawyers. PC learns from the decision of reviewers, scores documents for prospective relevance and prioritises documents that are most likely to be relevant for first level review by lawyers. As more documents are reviewed, the more the PC model improves and more accurate the score becomes which in itself enables the refinement down to a smaller set of total documents that eventually need reviewing.
It is not the case that keywords can be used firstly to cull data volumes and then using PC over the keyword hit documents. This deprives the PC methodology of assessing the entire pool of documents, which is excluded if keywords alone are used; because keywords alone do not pick up contextually or conceptually similar documents, as I have already noted.
A simple analogy is that PC will use automatically all synonyms for a word whereas keyword searching is strictly limited to the word or phrase itself.
What are the advantages of using Predictive Coding?
A review of recent matters,6 where I have applied both PC and keyword methods show that PC applied over the entirety of a review pool meant that the number requiring review was approximately 40% of the number that keyword searching identified as requiring review. In addition, PC identified a significant number of relevant documents that had not been recognised by any keyword searches.7
A landmark study in 1985 revealed that attorneys, using search terms and iterative search, supervising skilled paralegals believed they had found at least 75% of the relevant documents from a document collection when they had in fact found only 20%.8 There has not to my knowledge been any subsequent study which distinguishes or overturns the results of this study. From that 20% then only 60% would be accurately tagged using keyword searching so extrapolation is that only 12% of relevant documents are found by keyword searching.
Are there any financial advantages on using Predictive Coding?
Through the use of PC, parties can realise a significant decrease in disclosure cost. For example, in several case studies, legal teams have reported reviewing 64 to 93 percent fewer documents with PC due to the defensible identification and exclusion of non-responsive materials, which is confirmed through quality control sampling.9 Reviewing less documents means legal teams can expect to save that percentage in review cost. In a survey of 11 PC vendors, 4 reported an average cost reduction of 45 percent, while seven of the vendors reported savings as high as 70 percent.10 Similarly, another study found that the time and cost it takes legal team to conduct document review could be cut by 80 percent.11 Legal teams have also reported that the use of PC has allowed them to meet difficult deadlines.12
What are the challenges and objections on using Predictive Coding?
The main objection to the use of PC is that it is a “radical” approach. The reluctance stems from fear or lack of understanding of TAR and a comfort with old approaches.13
Additionally, another challenge associated with PC is that the training process that takes place as part of the PC process can be time consuming and lengthy.14 However, this is a fallacy, all time spent in a well-prepared PC workflow will be by the solicitors who have immediate conduct of the matter. The solicitors’ time spent in the initial training and set up of the PC workflow is highly valuable and adds significantly to their knowledge and on-going understanding of the matter.
There are other explanations for solicitors’ reluctance to rely on algorithms, but none of them are convincing. One explanation is that use of an algorithm will invite opposing counsel to demand more transparency into the algorithm than they would demand of traditional human document review. As e-evidence expert Judge Peck noted:
“Part of the problem remains requesting parties that seek such extensive involvement in the process and overly complex verification that responding parties are discouraged from using TAR.”15
If you want to find out more about our eDisclosure offering, you can visit our XBundle services page.
To get help with predictive coding, please visit our contact us page to arrange a time to speak with our services team.
6. XBundle Statistics drawn from client work product within the last twelve months, the subject matter of which is privileged and confidential.
7. David C. Blair and M. E. Maron ‘An Evaluation of Retrieval Effectiveness for a Full-Text Document- Retrieval System’ (March 1985), Communications of ACM, 28,3. (link here – An evaluation of retrieval effectiveness for a full-text document-retrieval system (acm.org))
8. Scott M. Cohen, Elizabeth T. Timkovich, and John J. Rosenthal, ‘The Tested Effectiveness of Equivio > Relevance in Technology Assisted Review’ (Dec, 2011) Metro. Corp. Couns. 17, 8
9. Anne Kershaw and Joseph Howie, ‘eDiscovery Institute Survey on Predictive Coding’ (October, 2010) eDiscovery Institute
10. Jason R. Baron, Ralph C. Losey and Michael D. Berman, ‘Perspectives on Predictive Coding’ (2016) ABA
11. Ibid, 208
13. Jason R. Baron, Ralph C. Losey and Michael D. Berman, ‘Perspectives on Predictive Coding’ (2016) ABA
14. N 7
15. Doug Austin, ‘Learning to Trust TAR as Much as Keyword Search: eDiscovery Best Practices, eDiscovery Today’ (June 28, 2021), https://ediscoverytoday.com/2021/06/28/learning-to-trust-tar-as-much-as-keyword-search-ediscovery-best-practices/