In this section

70022

Scalable Systems and Data

Module aims

In this module you will have the opportunity to:

get an overview of data centre technologies, the infrastructure needed to run a variety of workloads, and the design decisions when engineering scalable distributed applications.
analyze the full system stack for managing and scheduling data-centre resources
discuss the design principles for scalable systems
investigate concepts and techniques to build large scale systems, with a focus on distributed storage, coordination, computation and resource allocation.
get an overview of NewSQL and NoSQL technologies
understand new data models, their associated query languages and systems
discuss new storage technology and its impact on query execution and data management systems in general.

Learning outcomes

Define the requirements and challenges when architecting, building and managing a large-scale data-centre infrastructure with distributed systems
Know the design principles of modern technologies, encompassing the software stack that manages the data-centre resources and be able to critically assess them
Use the theoretical foundations from distributed algorithms to define the building blocks for scalable system design in relation to distributed storage, coordination and computation.
Critically assess the trade-offs between different requirements when designing scalable distributed systems
Discuss, compare and criticise the proposed approaches presented in state-of-the-art research papers targeting distributed systems
Get a basic understanding of NewSQL and NoSQL technolgies driven by new data models
Understand the trade-offs in converting between data models and database tools
Understand the implications of new hardware (storage class memory, SSD, main memory and multicores) on database management systems

Module syllabus

The course consists of mandatory reading of key (recently published) research papers related to aspects of scalable distributed systems design.

Overview of scalable distributed system design (goals and example systems, deployment environments, and challenges).
Overview of data-centres, their technologies and the link to cloud computing and introduction to latest technologies encompassing rack-scale computing
Hardware virtualization (full and para virtualization, virtualization of different resources, security)/OS virtualization (containers and serverless computing)
Design scalable services and applications (reference architectures, requirements and design priniciples for highly performant and fault tolerant systems)
Distributed Storage (CAP theorem, Consistency-types (strong, weak, eventual, etc.))
Distributed Coordination (consensus protocols, and use-cases)
Distributed Computation (data-flow and graph processing models)
Managing distributed resources (resource management and allocation, cluster schedulers)
NoSQL and NewSQL; Overview of the new data management landscape
Key/Value Store and their successor extensible record stores
Document stores; schema free: shifting the complexity from ETL to querying
New storage media; relative (random) access cost on disk compared to main memory/SSD
Main memory databases; transactions in main memory; random access in main memory
Flash/SSD databases; reading and writing (block erase); relative cost; wear levelling in write access; potentially outlook to phase change memory
Transactions on multicores; CPU cache hierarchy; multi-socket architecture; (workload driven) data placement.

Teaching methods

The material will be taught through a mix of traditional lectures, tutorials and in-class discussions of state-of-the-art technologies, systems and publications.

An online service will be used as a discussion forum for the module.

Assessments

There will be two courseworks that contribute 20% of the marks for the module.
There will be final written exam, which counts for the remaining 80% of the marks.

Written feedback will be given on the assessed courseworks, approximately 14 days after submission.

Reading list

Supplementary

Designing data-intensive applications : the big ideas behind reliable, scalable, and maintainable systems

Kleppmann, Martin, author.

First edition., O'Reilly

Module leaders

Professor Thomas Heinis
Professor Peter Pietzuch

Reading list

View the module reading list (Leganto)

70022

Scalable Systems and Data

Module aims

Learning outcomes

Module syllabus

Teaching methods

Assessments

Reading list

Supplementary

Designing data-intensive applications : the big ideas behind reliable, scalable, and maintainable systems

Module leaders

Reading list

Faculty of Engineering

Get in touch

Quick links

Find us on social media