Building a career in and around data science

by Claudia Cannon, Ester Buchaca-Domingo

4 June 2021

Our panel (left to right): Dr Seth Flaxman, Dr Kim Jelfs, Dr Will Pearse, Dr Nick Wardle

On 28 April 2021 the Faculty of Natural Sciences hosted a panel discussion on careers in and around data science in a natural sciences context.

The FoNS Data Science theme champions, Professors Sophia Yaliraki and Guy Nason, hosted a panel to discuss what it takes to become a research leader in this area. Chaired by Guy, and aimed at early career researchers in the Faculty of Natural Sciences, our panel told the audience about their journeys in data science, took part in a Q&A and concluded with tips for navigating this kind of career path.

We've summarised key discussion points below, including some of the panel's tips and resource recommendations. Imperial staff and students can also follow a link to watch the event again.

The panel

Dr Seth Flaxman
Department of Mathematics, Senior Lecturer in Statistical Machine Learning

Seth studied an Undergraduate degree in Computer Science and Mathematics at Harvard. He's always liked mathematics, with increasing focus on how it impacts the social sciences. After graduating he studied computer science at EPFL in Switzerland, and also worked as a data scientist at the World Health Organisation in Geneva. Before coming to Imperial he obtained a PhD at Carnegie Mellon University in Machine Learning and Public Policy.

Seth enjoys working with statistics because it allows him to take his work into many different fields, and agrees with the quote from John Tukey: ‘The best thing about being a statistician is that you get to play in everyone’s backyard’.

Dr Kim Jelfs
Department of Chemistry, Reader in Computational Materials Chemistry

Kim's PhD is in Chemistry - in fact, she's had no formal training in data science. Kim uses computational approaches towards enabling functional molecular material discovery, and is a lead member of Imperial's newly launched Digital Molecular Design and Fabrication (DigiFAB).

She says: "These are exciting times for the chemistry field, which is bringing together automation and data driven approach in chemistry to speed processes that normally take ages. Gathering large data will not only help to leverage materials discovery, but make it a much faster process in the future."

Dr Will Pearse
Department of Life Sciences at Silwood Park, Senior Lecturer in Applied Ecology

Will has a PhD in Biology and worked in a Faculty position in the USA before moving to the UK. His research applies the fundamentals of machine learning, statistics and data science to answer fundamental questions about the origins and future of biodiversity.

He says: "There is a real need in biology for a broader perspective; taking existing theories and applying them in new contexts. There's a need for better integration and understanding of how approaches might be made simpler."

Dr Nick Wardle
Department of Physics, STFC Ernest Rutherford Fellow

Nick did his PhD at Imperial. His research in high energy particle physics focuses on searching for physics beyond the Standard Model through precision measurements of the recently discovered Higgs boson. Nick uses sophisticated statistical methods, including the use of machine learning based methods, to (re-)interpret Higgs boson measurements and searches, and preserve the results for the future.

Nick did not have any training in statistics when he started his PhD, and has found it useful to use the expertise from his collaborators to improve his own knowledge along the way.

Audience questions answered

I am an experimental physicist, currently taking Coursera courses. How can I enter the area of data science? Can you recommend good sectors (and maybe companies) to look out for as an entry-level analyst?

Nick: Coursera courses are a very good resource for learning techniques and gaining experience quickly. Common aspects for all Big programmes in high energy physics are: data processing, data acquisition and data handling. Currently there is the challenge in physics to quickly and effectively process vast amounts of data needed for research, and skills obtained from Coursera courses (and others) will be useful to target this.

Many students have gone to Marcella, BCG GAMMA, Google, Facebook. Companies might approach universities to hire students to work on a research project.

Will: I advise reframing the narrative when applying for jobs - you are not at a complete entry level; you have a PhD! Don't assume that you do not have the experience in real data science. Think of interdisciplinary challenges and how to come out with a solution. You have a unique expertise which is your PhD. Lean on your strengths.

In terms of jobs, check on Civil Services Departments - each one has its own data science team. Check consulting companies too, as you will be extremely valuable to them.

And apart from Coursera courses, pick up some good books on fundamental aspects statistics. Here are some recommendations:

Bulmer, M., G. (2003) Principles of Statistics (1967), New York, Dover Publications
Edwards, A. W. F. (1992) Likelihood (Expanded Edition), JHUP
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., Rubin, D. (2021) Bayesian Data Analysis (Third Edition)
Friedman, J., Hastie, T., Tibshirani, R. (2009) The Elements of Statistical Learning (Second Edition), Springer

Kim: Many companies such as IBM are doing nice work in the chemistry space similar to what's being done in academia, in particular with problems related to fundamental science.

Nick: If you do go to work in industry, please don't forget us! There are so many opportunities to collaborate and work together between academia and industry.

Seth: Yes, I recently hired a postdoc from industry, and both sides have really benefited working together. If you're looking for opportunities outside academia, you should position yourself in a place where people are doing something new, and a project that has the potential to last. Make sure you ask the company questions about how they have evolved, how they approach making decisions and changes, and also things like which publications they follow. In both academia and industry, you want to work with people who are open to changes and are willing to learn, and who question if the gap between the kind of work being done in industry and the kind being done in academia is increasing or decreasing.

I'd also recommend this book: Spiegelhalter, D. (2019) The Art of Statistics, Pelican Books

Will: Ask the question: what is the problem the company wants to resolve? Depending on what needs to be solved, this will determine what you as a data scientist will do. For example, Google is focusing on how to digitise the entire planet, therefore they need to be cutting-edge, but not all companies or consulting firms need to apply cutting-edge techniques. In fact, many need people with depth and breadth to apply fundamental insights.

What skillsets do we need to have in order to be a data scientist? Should we attempt to follow a more mathematical and statistical track, or rather machine learning and AI, or is some kind of mixture better?

Will: Go for a fundamental knowledge of statistics, because you'll be able to apply it in ten years time. It will give you the skills to be flexible enough to solve any problem.

Nick: The tools that you use to research may change. Be focused on learning how to provide a way to solve a problem, for example, design an experiment that might test your hypothesis.

Kim: Go for online competitions where you can get your hands on a set of data and test yourself.

Seth: Remind yourself of humility. Data science is about exploration and hypothesis generation. We want to lead to science. We want science to support scientific discoveries. Being humble about your discipline helps you identify weaknesses which ultimately gives you strength. I like this paradigm: "It is not only about the things you know you do not know, but also about the things you do not know you do not know."

Stay connected to the FoNS Data Science theme

At the end of the event our Data Science theme champions reflected on the discussion points and thanked everyone involved. Sophia Yaliraki said: "We have a responsibility and there are consequences with our work. We need to have humility with the data we have and also with that we do not have. If we do not have good data, we will not have good results independently of any machine learning we're able to employ. A strong ethical component is so important, especially if our work has implications on the social sciences."

Guy Nason thanked the panel members for their time and insights. He also thanked the audience and reminded everyone that the FoNS Data Science champions are keen to stimulate research projects and research grants funding, and to connect people across FoNS. The Faculty has some funds to facilitate activities in this area, and the Champions encourage everybody to contact them if they have an idea they'd like to talk through.