A virtual datathon brought 75 staff and students from across the Faculty of Natural Sciences together to predict molecular packing in the solid-state.
In March, fifteen multidisciplinary teams of undergraduates, postgraduates and early career researchers interested in the area of data science participated in the first FoNS-DigiFAB Datathon. Their task? To solve a challenge on predicting molecular packing in the solid-state.
But what went on behind-the-scenes to make the event so successful on the day? As DigiFAB prepares for its launch event we met with the scientific committee behind its first Datathon challenge – Professor Sophia Yaliraki, Dr Kim Jelfs, Dr Andrew Tarzia, and Research Postgraduates Steven Bennett, Harry Hagan and Florian Song, to find out more…
Why did you choose a challenge on predicting how a molecule packs in the solid state?
Kim: Trying to predict crystal structure is one of the really interesting grand challenges in chemistry, and central to quite a lot of research in the field. For example, it’s relevant to how pharmaceuticals behave, and what their properties are. If you’re trying to get a material with the properties you want you can’t just look at the molecule alone – how it packs in the solid state also affects its properties, so this has remained a massive challenge over the past few decades.
The problems we posed weren’t just chemistry-oriented – statistical and mathematical problems had to be overcome during the event too. Steven Bennett Research postgraduate, Department of Chemistry
Another reason was that when you’re doing prediction you need large data sets. The Cambridge Structural Database archives millions of crystal structures, giving you the potential to set up a data challenge like this.
Sophia: We also thought it would be an opportunity to bring together aspects from different disciplines. We didn’t want a challenge that would only appeal to traditional chemistry students – we wanted to bring in physicists, mathematicians and even biologists. We were really pleased that the event appealed across the board, and the benefit of pooling expertise was realised in the prizes that were awarded – winners came from across FoNS departments.
Florian: From a technical perspective we needed a problem that could easily be converted to data files and formats that everyone’s familiar with. If you stay too technical with chemistry you just end up with data sets that no one outside of chemistry can actually work with. Harry and Steven worked hard to convert the data – even though it’s still in a chemistry context – into spreadsheet data that someone from the Maths Department, for example, could work with as effectively as someone from Chemistry or Life Sciences. So making the data really accessible to everyone was one of the challenges for us.
Steven: The data we ended up with allowed interaction in the datathon at different levels of difficulty – some data were easier to make predictions from than others. Some of the data sets had limitations and were imbalanced, and teams had to take that into consideration. The problems we posed weren’t just chemistry-oriented – statistical and mathematical problems had to be overcome during the event too.
So the whole event was designed to be as accessible to researchers from different fields as possible?
Sophia: Yes, that was absolutely key – the fact that undergraduates were able to join in alongside more experienced researchers on the same very high-level challenge was a reason why the event was so successful. The excitement was palpable! And we were very pleased with the amount of people who attended – 75 staff and students with a diversity of expertise at different levels. This was what we hoped would happen, and it made for a really interesting and successful event.
Florian: Not only did we want a challenge that was open to different disciplines, but also open to researchers at different career levels – whether they were at undergraduate level or postdocs, so we had to provide data that could be used to create challenges of varying difficulty.
Activities like this provide an opportunity for students and staff to work together outside of regular curricular academic study. How did you enable the communicative and social aspects of the event?
Florian: Steven’s brilliant gather.town skills really contributed to its success – he created some really cool virtual conference rooms! Gather.town is a platform that allows everyone to have their own avatar, and you can talk within teams, or with others nearby. It really mimics a real-life hackathon where you have the flexibility to not only have a chat with your own team while you’re working on a challenge, but also to inadvertently meet people in the corridor and strike up conversation. Kudos to Steven for setting that up! In face-to-face hackathons organisers have to dash between different tables – even different rooms – to make announcements, but in gather.town you can make announcements from ‘broadcast tiles’, which is really helpful in terms of managing events.
Kim: The virtual nature of the event allowed us to have larger numbers too – I think we’d have had to cap it if we were doing it in person, because we’d have had to choose a room size in advance of registration.
Florian: Gather.town is super flexible, so you can have two people or two hundred people in the same room and it still works. This allowed us to open the registration up and just see how many people took an interest. The challenge for us then was to cater to everyone, but as helpers at the event we were able to easily move between tables and help teams out.
Kim: Normally it might be hard to identify who the helpers are at a real-life hackathon, but in gather.town our helpers changed their avatars to snow-people, so it was really easy for everyone to identify who was helping!
Sophia: That’s another reason why gather-town is great, because participants could literally walk up to the snow-helpers with questions rather than having to raise their hands and wait – it really helped with the dynamics of interaction.
The Datathon was completely rooted in a real-life grand challenge in chemistry. You obviously wanted it to be engaging for participants on that level?
Andrew: Kim came to me with the idea for a hackathon, and I thought it sounded interesting. At the start of the process I had to get to grips with the techniques that we wanted participants to be able to explore on the day, so for me there was something to learn in putting together the data myself, and then watching how Steven, Harry and Florian approached it. Whilst we were putting together the data and the problems we engaged with people at Imperial who are actually working in the field, specifically asking things like: “if we can predict this thing, how helpful is that to you as a person who does this on a day-to-day basis?” We made sure that everything we did was rooted in usefulness, not just made up arbitrarily.
Harry: I was keen to get involved because it’s something that’s very related to what I’m doing in my PhD. When I started my PhD last October this was the kind of challenge I was trying to get my head around, and preparing the Datathon was almost like a datathon in itself – working in real time in a team and putting something together in advance of a deadline is really satisfying and good fun.
Sophia: It was a very organic way of working – our scientific committee essentially worked towards the same challenge that was presented to the participants at the datathon. There was a point when everyone on the committee felt a little apprehensive about the responsibility of putting the event together in such a short time, but it was a success partly I think because each person on the committee brought a particular set of skills. We worked together to achieve something that ended up being even more valuable than we hoped – I think everybody who took part, whether on the organising committee or as a participant, got something out of it. It was great teamwork.
So putting the Datathon together was, in itself, a genuinely useful process for you all to go through together?
Sophia: Yes, absolutely.
Florian: The reason it worked so well was because we all had our own little tasks. Putting the technical side of a hackathon together has a bunch of different aspects to it – from initial data extraction to designing how the challenge is presented to participants on the day – and everyone had their own little part in the game. One thing that worked really well was the auto-grading system that we devised, to determine the winner at the end. Teams could upload their predictions and the system would not only score them automatically, but also give them instant feedback, which saved us so much time. Everyone in the committee was proactive and so all these little pieces of the puzzle were able to come together successfully on the day.
The top six teams, as determined by the auto-grading system, had the chance to present their predictions, before the committee decided on the ultimate winners – what stood out about the winning teams?
Harry: It was really interesting to hear how teams reflected on their own working process during the presentations. I liked how the team with the winning presentation referred to the existing literature and found ways to improve it. They actually asked if the GitHub was remaining open so that they could try and get better predictions in their own time, which I thought was great. Teams were genuinely engaged in the challenge.
Sophia: What was nice was that each team took different approaches. The winning team tried to pick out the best approaches from the literature, the second team focused on the data, which of course is an important aspect of data science – the better I understand my data the better predictions I’ll make – and so that too was a very noble approach. In fact one of our colleagues asked me if there was a prediction that any of the teams made that we can test experimentally, and I think that’s perhaps something for us to think about – to go back to the results of the Datathon to see if there was something there that would be interesting to take forwards.
One of DigiFAB’s aims is to enable a digital chemistry community to flourish at Imperial – tell us a bit more about that.
[We hope] people will shed assumptions that “chemistry is not for me” or “I can’t use machine learning because I’m not a mathematician or a statistician” – this is very much part of the spirit of DigiFAB – bringing people together to work on challenges. Professor Sophia Yaliraki Department of Chemistry
Sophia: I hope the Datathon is a snapshot of what DigiFAB wants to achieve, which is to bring together skills from different disciplines to try to solve this ambition of designing molecules, both via theoretical and experimental processes. The idea is that the community we hope to build will comprise people from different disciplines and that people will shed assumptions that “chemistry is not for me” or “I can’t use machine learning because I’m not a mathematician or a statistician”. Shedding those assumptions is very much part of the spirit of DigiFAB by bringing people together to work on challenges. Next time we organise a hackathon we hope to be more ambitious in the tasks that we give participants, and we hope that they will work on real-life problems and innovate in ways we haven’t thought of.
Kim: The nice thing was that we brought together people from lots of departments who wouldn’t have otherwise heard of DigiFAB or digital chemistry, or thought about connecting chemistry with artificial intelligence or machine learning. The thing that I heard most often from participants at the end of the Datathon was “thank you I’ve learnt so much” – and that aspect of learning was different for different people – some were learning more about chemistry, others were exploring machine learning for the first time. The majority of the people who registered hadn’t done anything like a datathon or hackathon before – in fact I think overall the majority hadn’t done any machine learning before – so to come out at the end of the day having had the opportunity to apply lots of models and so on was a big leap. Lots of people said that they’re looking forward to the next one, so it’s definitely created interest in this area across the Faculty!
Find out more
- Join us to celebrate the launch of DigiFAB on 18 May 2021 – register here
- Sign up to the DigiFAB mailing list to stay up-to-date on future news and events, including future datathons
- Find out more about DigiFAB
Article text (excluding photos or graphics) © Imperial College London.
Photos and graphics subject to third party copyright used with permission or © Imperial College London.
The Grantham Institute for Climate Change