Data Spark: The next step in legal analytics


3 min read

Data-driven decisions are becoming the standard as companies increasingly recognise the additional value of quantitative methods. Leveraging data to optimise business practices is beneficial not only for shareholders, but for the consumer as well. Personalised recommendation systems, such as the Netflix Suggested for You demonstrates the benefits of embracing a data-driven business approach. Netflix’s algorithm works with users’ historical viewing data and data from users with similar tastes to provide content recommendations in order to maximise the amount of time using their service.

This approach usually relies on having a structured dataset (i.e. in a standardised format). Structured data is stored and accessed in databases using methods such as structured-query language (SQL) and can easily cleaned, processed and visualised. A far more complex data type is known as unstructured data. This consists of data that has no defined organisation and is common across industries in procedures MRI scans, or X-Rays.

The legal sector is another example where unstructured data is the norm, due to its use of language to operate. This makes quantifying the attributes of legal contracts and procedures inherently complex. Quantifying these attributes is currently done using one of two techniques: automatic detection using natural language processing (NLP) or the traditional approach of manually evaluating highly technical documents and collecting the pertinent information. NLP is a newly emerging AI technique for automatic detection and interpretation of language information. The problem with the former is that this is a newly emerging technology and at present, has not been refined to the extent where it can effectively analyse such documents. In the case of the latter, it is a work-intensive, costly approach.

In this Data Spark project, a team of Imperial College students partnered with DLA Piper to explore the feasibility of applying advanced data science techniques including NLP to the anonymised legal contracts written within a practice area. The application of novel technologies such as this has the potential to create an enhanced understanding of operations and structures within the wider legal sector and provide industry-leading insights into legal proceedings.

Using a structured dataset, the team sought to understand how contracts can vary across different sectors or geographies through identifying patterns and correlations. Building on this, the team then looked at unstructured data and developed a proof-of-concept code to enable the automatic extraction of metadata from contracts. The team were able to demonstrate that given certain conditions, automatic extraction would be feasible and that the potential for insights could be transformative, both in terms of sector working practices and legal insights. 

Otto Godwin, Data Scientist DLA Piper, says “The Imperial team were exceptional and delivered a thoroughly comprehensive and incisive analysis. Data science in industry is filled with examples of really smart people failing to capture their audience and to grasp their industry domain. In this case only the opposite was true and what most impressed our Partners was Imperial’s speed of acclimatisation to the domain, the jargon, and the problem statement. Within a mere 6-weeks, Imperial had taken a highly specialised legal domain and deployed some street-smart data science and cutting-edge NLP algorithms to deliver outstanding and completely novel insights."

Gareth Stokes, a Partner in DLA Piper’s Intellectual Property and Technology Group commented ”We’re well aware of the many possibilities that better data analysis can yield for the firm and our clients. During this project, the Imperial team worked seamlessly with DLA Piper colleagues to deliver some shrewd insights and helped point the direction for the next phase of our data-driven client offerings. The pace and enthusiasm with which the team progressed the project was excellent, and I’m very much looking forward to where else this type of analysis can take us.”

Written by: Alvaro Bejar, MSc Business Analytics 2019/20

Team members: Alvaro Bejar, Emily Dothe, Alonso Lanzagorta, Pak Ho Shen, Maria Tan, Marios Zoulias.

Academic Mentors: Sadia Haider, Miguel Molina Solna