How machine learning can drive new material discovery

Dr. Jacqui Cole, from the University of Cambridge, explained to a packed lecture theatre the many steps needed to mine data to design the optimum material for a given application.

Jacqui Cole explained to a packed lecture theatre the data mining process behind designing an optimum material for a given application.

Dr Jacqui Cole of the University of Cambridge gave an insightful IMSE Highlight Seminar on data-driven molecular engineering of functional materials.

The world needs new materials to stimulate industry in key sectors of our economy, from environment and sustainability to information storage, and efficiency of chemical processes. Yet, nearly all functional materials (materials which possess particular native properties and functions of their own) are still discovered by ‘trial-and-error’. This lack of predictability affords a bottleneck to technological innovation. That is according to Dr Jacqui Cole, Head of the Molecular Engineering group at the Cavendish Laboratory, who gave her IMSE Highlight Seminar last Thursday. Dr. Cole's Molecular Engineering group is a joint initiative between the Cavendish Laboratory and the Department of Chemical Engineering and Biotechnology at Cambridge with the ISIS Facility RAL.

The world needs new materials to stimulate industry in key sectors of our economy. Dr. Jacqui Cole Head of the Molecular Engineering group, Cavendish Laboratory, University of Cambridge

In Dr. Cole's seminar, data-driven molecular engineering was given a detailed introduction along with examples of how the emerging field offers prospective solutions. Such approaches to materials discovery are only now becoming possible due to recent advances in artificial intelligence, the rapid rise in high-performance computing capacities, and changes in government legislation that regulates the open-access of scientific data.

Dr. Cole's Molecular Engineering research group have succeeded in encoding a given molecular design and engineering strategy into algorithms that search through massive chemical-property datasets to discover a material that suits a given application.

The machine learning approach

The materials discovery approach uses machine learning to comb the available scientific literature and automatically extracts chemical information in a tool called ChemDataExtractor. ChemDataExtractor can extract chemical names, properties, and spectra from a journal article so they can be imported into a database or spreadsheet.

steps of ChemDataExtractor — How the ChemDataExtractor tool works in 3 steps.

Using state-of-the-art natural language processing algorithms to interpret the English language text that makes up the majority of scientific documents, machine-learning methods can extract valuable information from each sentence. As a result, it produces a full record containing identifiers, properties, and spectra for each unique chemical entity in the document.

The result means that ChemDataExtractor is able to predict new functional materials that can then be experimentally validated using a range of advanced materials characterisation and device testing methods. One example of the potential of this novel approach is the discovery of new light-harvesting materials for dye-sensitized solar cells, which has been included in several buildings as a source of renewably generated electricity.

ChemDataExtractor is available as an open source python package that you can download and use for free at http://chemdataextractor.org.

The machine learning approach

Reporter

Contact details

Tags:

Latest news

News in brief

Brain fluid surgery to a world-first lunar-Earth flyby: News from Imperial

fond farewell

Professor Jonathan Weber reflects on his time as AHSC Director

Food for thought

Food allergy doubles in the UK over last decade but many still without treatment

Most popular

1 Infection study:

COVID-19 – how long am I infectious and when can I safely leave isolation?

2 Psychedelic treatment:

Magic mushroom compound increases brain connectivity in people with depression

3 CO2 storage limits:

Study finds limits to storing CO2 underground to combat climate change

4 Psychedelic experience:

Advanced brain imaging study hints at how DMT alters perception of reality

Latest comments

Comment on Professor David Q Mayne FREng FRS 1930 - 2024: Only now do I hear (from the IFAC newsletter) the very sad news of David's passing three months ago.…

Comment on Cigarettes have a significant impact on the environment, not just health: The environmental impact of cigarettes is often overlooked, but it's a serious issue that needs more…

Comment on Double-slit experiment that proved the wave nature of light explored in time: I believe it, but I can’t believe it!

Latest Tweets