Large Scale Data Management

Module aims

This course will teach students about the evolution of database systems in face of new requirements (different access patterns, scalability, relaxation of transactional guarantees) and new hardware (storage class memory, SSD, main memory and multicores), i.e., it critically discusses new technologies like NewSQL (for new hardware) and noSQL (for new requirements/use cases). More particularly the course will study:

1)    Scaling Traditional/Relational Databases

2)    CAP Theorem and ACID properties

3)    NoSQL (new access patterns)

a.     Key/Value Stores

b.     Document Stores

c.     Extensible record stores

4)    NewSQL (modern hardware)

a.     Main Memory Databases (VoltDB, SAP Hana, Hyper)

b.     Transactions on Multicore CPUs (ShoreMT)

c.     Databases on Storage Class Memory/Flash/SSD (Aerospike)

Learning outcomes

Module syllabus

The structure of the class will follow the outline given before:

1) Traditional DBs; Parallel approaches; Difficulty of scaling while maintaining ACID properties; CAP Theorem

2)    Introduction to NoSQL and NewSQL; Overview of the new data management landscape

3)    Key/Value Store and their successor extensible record stores

4)    Document stores; schema free: shifting the complexity from ETL to querying

5)  New storage media; relative (random) access cost on disk compared to main memory/SSD

a.   Main memory databases; transactions in main memory; random access in main memory

b.  Flash/SSD databases; reading and writing (block erase); relative cost; wear levelling in write access; potentially outlook to phase change memory

c.  Transactions on multicores; CPU cache hierarchy; multi-socket architecture; (workload driven) data placement

Pre-requisites

Advanced Databases (or equivalent experience)

Teaching methods

Assessments


Module leaders

Dr Thomas Heinis