In this section

Big Data with R

Key information

Tutor: Dr Liam Gao
Duration: 2 x 1.5-hour sessions
Delivery: Live (In-Person)
Course Credit (PGR only): 1 credit
Audience: Research Degree Students, Postdocs, Research Fellows

Dates

15 & 18 December 2025
11:00-12:30, South Kensington

Course Resources

Materials & pre-course set-up

Current scientific problems often involve processing of big data. One of the most popular statistical tools, R, is often used by students and researchers as a platform for such data processing. However, when you work with increasingly larger datasets, you may run into scalability problems. The standard data processing tools available in R often struggle to deal with such datasets on local desktop or laptop computers, owing to software and hardware limitations. On the software side, the data structures are normally not optimized for big data. On the hardware side, main memory requirements far exceed a conventional hardware capability.

Fortunately, you can greatly reduce these problems by making use of some specific R packages. Three R packages - data.table, dtplyr, and sparklyr - expand your capability to work with big data. These packages also provide new methods to boost the capacity of local computers and computational performance when dealing with huge datasets in csv or a relational database format. You will learn the basic skills of manipulating big datasets in R/RStudio for prototyping data analysis.

The workshop will be delivered through a combination of slides, live demonstrations and hands-on practice

Syllabus

What is big data?
The data.table package and performance comparison with the widely used data.frame
Dataframe manipulation with dtplyr
Interfacing with Apache Spark in R (sparklyr)
Leveraging the pipe operator in data analysis

Learning Outcomes

On completion of this workshop, you will be better able to:

Use the data.frame, dtplyr and sparklyr packages in RStudio or R to load big datasets more efficiently
Manage data types by choosing the appropriate functionality from big data processing packages
Compose code for cleaning and analysing big data using the packages
Integrate external tools such as a relational database system or Apache Spark to your analysis

Pre-requisites

You will not be able to follow the course without these essential requisites:

Basic knowledge of R programming language
Pipe operator (learning materials are available at https://github.com/ImperialCollegeLondon/RCDS-data-processing-with-r/blob/master/data_processing_with_R_2.Rmd)

How to book

Early Career Researchers (Research Degree Students, Postdocs, Research Fellows) should book via Inkpath using your Imperial Single-Sign-On.
Wider College Community - All other members of the Imperial community, should book here.

Please ensure you have read and understood ECRI’s cancellation policy before booking.

Inkpath booking system

Guidance and FAQs on booking courses.

Cancellation Policy

Information about the required cancellation notices for places on ECRI courses.

Big Data with R

Key information

Dates

Course Resources

Syllabus

Learning Outcomes

Pre-requisites

How to book

Inkpath booking system

Cancellation Policy

Contact us

Find us on social media

Big Data with R

Key information

Dates

Course Resources

Syllabus

Learning Outcomes

Pre-requisites

How to book

Core booking blocks

Inkpath booking system

Cancellation Policy

Contact us

Find us on social media