Dr Francesco Sanna Passino

Faculty of Natural Sciences, Department of Mathematics

Lecturer in Statistics

Contact

f.sannapassino Website

Location

552Huxley BuildingSouth Kensington Campus

Summary

Teaching

MATH70099 - Big Data: Statistical Scalability with PySpark / MATH70072 - Big Data

This specialisation consists of three components: statistical analysis at scale, distributed programming using MapReduce, Big Data analysis using PySpark. The first component covers theory on statistical scalability, and discusses topics such as Markov Chain Monte Carlo methods for tall data, stochastic optimisation, and statistical analysis of streaming data. The students will learn statistical concepts such as Bayesian parameter estimation with large scale data, and will explore data sampling strategies in a Big Data world. The second and third components cover practical aspects of handling Big Data, introducing two frameworks for statistical analysis of large datasets: Hadoop and Spark. Students will learn how to write computer programs to fit statistical models on large datasets using MapReduce and PySpark.

The module runs in the summer term in the online MSc Machine Learning and Data Science (MLDS), and in the second part of the spring term for the MSc Statistics.