Do you want to learn more about digital molecular design?

When is the datathon?

Wednesday 24 March, 9.30-17.00. Places are limited, so please complete the registration form to secure your place. Registration will close on Tuesday 9 March.

What is this datathon about?

The Digital Molecular Design and Fabrication (DigiFAB) Institute and the FoNS Data Science Theme invite third- and fourth-year undergraduate students, postgraduate taught and research students, postdocs and early career researchers interested in the area of Data Science to participate in this free datathon. Entry requirement is low, existing knowledge in chemistry is not needed and fun is encouraged! Note the teams can be formed prior to the event, otherwise organisers will facilitate the formation of multidisciplinary teams.

Datathon challenge: 'How do you predict how a molecule packs in the solid-state?'

This is a challenge that is of great scientific importance, with research groups and industry world-wide working on solutions to this problem, including digital chemistry-based solutions. Current approaches to Crystal Structure Prediction (CSP), as it is usually known, involve physical modelling - sampling the possible solid-state arrangement of a molecule (hundreds of thousands of hypothetical packings are typically considered, with different symmetries, relative orientations, etc.) and then reliably ranking their energies. Generally, experimentally-observed packings will be at the global minimum or one of the most thermodynamically stable structures. This overall process is very computationally expensive, and digital solutions offer a chance to lower this cost.

Simple targets will be chosen for the day of the hackathon. For instance, to predict the density of the solid-state packing of a chosen small organic molecule with no solvent present. The dataset will come from the Cambridge Structural Database, where thousands of reported single-crystal X-ray diffraction structures from literature are included. Participants will work together in teams of four to five people and will be provided with this dataset. Note semi-cleaned data will be given to the teams so entry requirement is low and doable within a day.

Excitingly, this datathon will be run using technology. Participants will receive instructions on how to join on the day with the chance to meet their teams before the event (please read further details below) and the scientific committee: S. Bennett, H. Hagan, A. Tarzia, F. Song, G. Nason, S. Yaliraki and K. Jelfs.

What will happen on the day?

9.30-10.00 Welcome and introductions (full details about the challenge will be given)

10.00-12.30 Datathon

12:30-13:00 Lunch break

13.00-13.40 Keynote speaker: Dr Edward O. Pyzer-Knapp (IBM Research Global Lead AI Enriched Modelling and Simulation/ Visiting Professor in Industrially Applied AI University of Liverpool)

TITLE OF THE TALK: Project Photoresist; Accelerating Materials Discovery with AI, Hybrid Cloud and Robotics 

ABSTRACT: Discovering new materials is essential to creating products that address global sustainability challenges. Semiconductors are core to much of the technology we use today, and in 2020, they became the subject of regulatory scrutiny.  Photoacid generators (PAGs), a critical class of materials used in semiconductor manufacturing, were evaluated more closely for environmental risks. So at IBM Research, we set out to discover more sustainable, viable materials.  Discovering a new molecule usually takes up to 10 years and $10–100 million. We were able to identify and synthesize three novel PAG candidates by the end of 2020—meeting an environmental challenge in record time. In this talk, I will show how, through the use of our end-to-end AI-powered workflow, we were able to scale and handle problems in a way human scientists simply cannot, dramatically accelerating the discovery process.

13:45-16:00 Datathon

16.00-17.00 Team presentations and prizes awarded

How do I participate?

  1. Please register by 9 March by filling the online form (click on the link to access the form). If participants wish to be in the same team, each individual must register indicating in the last question from the online form whether they would like to be part of a specific team and provide the names of the other participants.
  2. Participants will be contacted by the committee who will send further instructions and facilitate the formation of multidisciplinary teams. Teams will be up to 5 people.
  3. Teams will be able to meet each other before the actual Datathon, and become familiar with on 16 March (13.00-13.30).
  4. The Datathon will take place on the 24 March, 9.30-17.00 via

What can I win?

Brilliant prizes will be awarded:

  • First place team: £200 per person
  • Runner up team: £100 per person
  • Third place team: £50 per person

Follow up this webpage for further updates! If you have any question, please email Ester Buchaca-Domingo  or Feng Li.


Do I need to prepare before the Datathon? All the detailed information will be released at the same time to all participants in the beginning of the datathon (24 March). No preparation is required or expected, but we would encourage you to join the ‘meet your team event’ on 16 March (13.00-13.30). Broadly speaking, you will be given a set of experimental data and targets but you will be free to choose the data-driven methods you would like to approach them with, whether drawn from unsupervised and/or supervised machine learning. 

I am a 2nd year undergraduate, can I participate in the Datathon? If you are familiar with Data Science principles and methods, we are happy for you to register and the scientific committee will confirm nearer the time whether you can participate.