Creating a FoNS ECR data science network: meet our new Ambassadors

Steven Bennett, Ioanna Papatsouma, Elizaveta Semenova

Three early career researchers from across FoNS have been appointed Data Science Ambassadors.

Steven Bennett (Chemistry Research Postgraduate), Ioanna Papatsouma (Teaching Fellow in Statistics) and Elizaveta Semenova (Research Associate in Mathematics) have been selected by the Faculty Research Strategy team and our theme Champions to foster connections and organise activities across Faculty of Natural Sciences (FoNS) departments in the area of data science, aimed at early career researchers (ECRs).

We caught up with them to find out why data science is such a hot topic and what the grant application landscape looks like from their perspective.

Hello Ambassadors – tell us why you're so interested in data science

What excites me is that data science is an alternative approach, a different way to tackle a problem that we never had access to before. Steven Bennett Research Postgraduate, Chemistry

Steven: Data science answers questions that would be impossible to answer using conventional computation. This is significant in the field of chemistry, which has turned towards data science methods in recent years. What excites me is that it's an alternative approach, a different way to tackle a problem that we never had access to before.

Ioanna: There are so many different sources of data, and the modern world produces such a huge volume, conventional computations just aren’t enough to deal with it. Data science is the answer.

Liza: In the fields where we have a lot of mechanistic understanding about phenomena, at least we can simulate a process – a chemical reaction, for example. In fields that are purely observational, there's no other way aside from analysing the data. In reality “data science” isn’t a standalone field, it’s a tool. In the past there was a saying that “mathematics is the queen of all sciences”, because it underpins everything. Maybe nowadays we can say that data science is the platform on which many other observational sciences rely.

There are lots of examples that highlight how important it is that we keep refining and improving our ability to analyse complex data sets. Policymakers and the WHO have had to rely on data during the COVID-19 pandemic, to answer critical questions. A big issue is bias in the surveys and estimates that we get. To get the data that will help us answer important questions, how should we run surveys in the most efficient and accurate ways? We're having to question how best to identify new strains of infectious diseases, for example, because the rate of genomic surveillance screening systems differs vastly between countries, so data sets can be misleading. Another important example of this is racial bias in the field of machine learning for facial recognition.

As Ambassadors, what’s your aim?

Liza: We’re aiming to bring early career researchers together, allowing them to build a network, acquire skills, meet other people, get exposed to new problems, explore job options and perhaps most importantly, gain confidence. When you’re doing a PhD, you’re working on a challenging, long-term project, which might feel frustrating at times. It’s easy to overlook that your competences still keep growing on daily basis, and that there’s so much more to the data science world beyond a single research question. The activities we want to put on as Ambassadors will hopefully give ECRs a positive, potentially useful, break from their research focus, and the chance to gain confidence and build a social network.

Ioanna: We’ll start with some social events, with ice breaker activities so that people can get to know each other. We’re thinking of having a seminar series, with invited speakers from industry and academia, and some educational and skills events as well, like tutorials or labs. We’re planning datathons and an industry fair too.

What appealed to you about the role?

Steven: The thing that excites me most is the opportunity to develop an interdisciplinary community. I helped with the DigiFAB Datathon, which attracted many students from different subject areas. It was interesting to see how they worked together to tackle the problem, using expertise from their respective disciplines and translating that across to data science, which is itself very interdisciplinary. It was a great experience to see people, who may never have otherwise met, discussing ideas and approaching a problem from different angles.

It’s so important as a scientist not to be involved only with your own field – the social aspect of science is invaluable. Elizaveta Semenova Research Associate, Mathematics

Liza: It’s so important as a scientist not to be involved only with your own field – the social aspect of science is invaluable. Since I started working at Imperial I've been working from home because of the pandemic, and I haven't met anyone outside of my group! This Ambassador role is a chance to meet people. Research is becoming more and more interdisciplinary, and it’s a great opportunity to strengthen interdisciplinary networks within FoNS, from both a Faculty, and personal perspective.

Ioanna: It’s good to take a break from our routines too. Being a Faculty Ambassador is something completely different from our individual research, and from our teaching and admin responsibilities. You get to work on something broader, to meet researchers across the Faculty as part of a team that’s working to build a community. Not only is the role a great chance for us to expand our professional networking but also to explore data science avenues that we weren’t previously aware of.

Steven: I have my own little chemistry bubble over at the MSRH in White City – there are lots of people, especially in chemistry, who are interested in data science, but it might not directly relate to their research, so it'd be nice to give them opportunities to get involved by providing educational, career and networking opportunities. We’re interested to know more about ways of transitioning into a job in data science.

Liza: So many projects that I’ve been involved with over the years started socially – because of someone I know, or someone I met by chance who happens to be as excited about a topic as I am. In the job market, opportunities might not get officially posted, and it can be about your network and who you know. All my jobs post-PhD I learned about on Twitter, and Twitter is a digital social network.

The FoNS themes aim to bring academics together to apply for big grants. As an ECR, how do you feel about the grant landscape?

Part of bringing a data science community of ECRs together could, in the long run, help us to support each other with the skills needed to put together exciting and successful grant applications. Ioanna Papatsouma Teaching Fellow, Mathematics

Ioanna: Part of bringing a data science community of ECRs together could, in the long run, help us to support each other with the skills needed to put together exciting and successful grant applications. ECRs might never have applied for funding before, and perhaps pooling different expertise around particular issues, and knowing who to collaborate with, increases the probability of a successful application.

Steven: I have no experience of applying for grants – I’m in the final year of my PhD – but I know in chemistry there are a lot of people applying for grants related to data science, and there’s increasingly more funding related to this field. Automation is becoming more and more popular in chemistry, and so people are getting very interested in how to analyse data, because they’ve got access to so much of it. It would be nice to advertise our workshops and activities as opportunities for ECRs to get used to working with this volume of data. Once they gain experience in handling data, they could then potentially direct their grant applications towards these types of areas that are appealing to funders.

Some FoNS research doesn’t map neatly underneath our four themes. Is it useful for ECRs focused on fundamental research to get involved?

Liza: Absolutely. Confidence, and variety of experiences, play such an important role in our careers. There was a period when I was doing only theoretical research in mathematics – just pen and paper, not touching computers and not knowing how to write a single line of code. That kind of work is exciting, but it can be very slow-paced. Over time, through a series of extra-curricular events, followed by a PhD, I found my way into applied mathematics and computational data science.

As an ECR it’s important to gain exposure to different types of research or business questions, as well as modes of work. This way, one creates a choice for themselves, ensuring that the current job they are taking is something that they really want to do, rather than the only thing they know how to do.

How does your research relate to data science?

Ioanna: I’m a statistician, and my research interests lie in clustering. I’m particularly interested in clustering mixed-type data – that is, data consisting of both continuous and categorical variables, and applications in healthcare. Clustering is the task of grouping a set of objects in such a way that objects in the same cluster are, in some sense, more similar to each other than to those in other clusters. It’s closely related to data science, as it’s an unsupervised machine learning task used in diverse areas, such as in healthcare, cyber security, pattern recognition, finance and marketing.

I’m working on the investigation of robust clustering methods, and benchmarking studies, comparing different clustering methods while considering various aspects simultaneously. My aim is to derive new insights, and determine under what conditions each method works best and why. Those inferences are based on simulated and real-data, and its data science that helps us to make fast and accurate data-driven decisions.

Steven: My background is entirely in chemistry, with no formal maths or statistics, but my doctoral research has transitioned into a project that sits at the interface between the computational and experimental. I try to predict which molecules we’d be able to synthesise in a lab using data driven methods and machine learning, prior to doing any experimental work. This would be difficult to achieve using traditional computational chemistry techniques, and could take many years to achieve experimentally, so the research group I am in has turned towards more cheminformatics approaches. It’s incredibly useful to have predictive models that allow us to determine what to focus lab time on. We also gain insight into the functionalities that would make a molecule hard to synthesise, and which we should therefore avoid trying to replicate in the lab. This approach allows us to then go into the lab and try to experimentally realise our predictions, so I overlap between two areas.

The type of molecules we're looking at are for materials design. Typically, these have to be reactions that result in high yields and precursors that are cheap to access, because we need materials like this on a very large and abundant scale. The materials we work on are porous, so we’re exploring applications like carbon capture and storage. Deciding which materials are best for mitigating these issues is only possible with data science because it allows us to screen thousands of different possible combinations of precursors. It really accelerates the rate at which we can discover new materials, which has potential to impact huge global issues like the climate crisis.

Liza: I work at the intersections between statistics, machine learning, spatiotemporal modelling, epidemiology and public health, on a variety of data-heavy projects. Data science can help us to estimate where vaccination rates are the lowest, for example, or which environments are prone to becoming hotspots of certain diseases. It recently helped my colleagues to estimate how many children in each country were orphaned as a result of COVID-19, informing measures taken to look after them.

A separate branch of my research focuses on predicting the toxicity of compounds. When we develop new drug we need to screen them to ensure that they’re not only treating a disease, but that they also do not damage human organs. Such models can help us assess which compounds are good potential candidates for future drugs.