In this section

Calendar

Course Calendar

Date	Lecture	Instructor	Topic
Week 2
Oct 8, 9:00-11:00	Lectures 1 + 2	Peter Pietzuch	Scalable distributed systems design (slides)
Oct 11, 14:00-16:00	Lectures 3 + 4	Peter Pietzuch	Data centre and cloud computing (slides)
Week 3
Oct 15, 9:00-11:00	Lectures 5 + 6	Peter Pietzuch	BigTable (slides) Required reading: "Bigtable: A Distributed Storage System for Structured Data"
Oct 18, 14:00-16:00	Lectures 7 + 8	Peter Pietzuch	Dynamo (slides) Required reading: Dynamo: Amazon's Highly Available Key-Value Store
Week 4
Oct 22, 9:00-11:00	Lectures 9 + 10	Peter Pietzuch	Spanner (slides) Required reading: "Spanner: Google's Globally-Distributed Database"
Oct 25, 14:00-16:00	Lectures 11 + 12	Thomas Heinis	Introduction & Main Memory Databases (slides, slides)
Week 5
Oct 29, 9:00-11:00	Lectures 13 + 14	Thomas Heinis	Solid State Disk and Databases (slides, slides)
Nov 1, 14:00-16:00	Lecture 15 + 16	Thomas Heinis	Graph Databases (slides)
Week 6
Nov 5, 9:00-11:00	Lecture 17 + 18	Peter Pietzuch	MapReduce (slides) Required reading: "MapReduce: Simplified Data Processing on Large Clusters"
Nov 8, 14:00-16:00	Lecture 19 + 20	Peter Pietzuch	Spark (slides) Required reading: "Resilient Distributed Datasets"
Week 7
Nov 12, 9:00-11:00	Lecture 21 + 22	Thomas Heinis	Document Databases (slides, slides) & XQuery/XPath slides (not examinable)
Nov 15, 14:00-16:00	Lecture 23 + 24	Thomas Heinis	Graph & Document Database Tutorial
Week 8
Nov 19, 9:00-11:00	Lecture 25 + 26	Thomas Heinis	Transactions on Multicores (slides)
Nov 22, 14:00-16:00	Lecture 27 + 28	Thomas Heinis	Cold Storage (slides, slides)
Week 9
Nov 26, 09:00-11:00	No lecture
Nov 29, 14:00-16:00	No lecture

Reading and discussion materials:

Week 3:

"Bigtable: A Distributed Storage System for Structured Data", Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Seventh Symposium on Operating System Design and Implementation (OSDI), Seattle, WA, November, 2006
1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
2. What is good about the paper? What is not good about the paper?
3. How does the design of BigTable compare to that of a parallel relational database management system (RDBMS)?
4. What limits the scalability of the BigTable design?

"Dynamo: Amazon's Highly Available Key-Value Store", Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels, ACM Symposium on Operating Systems Principles (SOSP), Stevenson, WA, October 2007
1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
2. What is good about the paper? What is not good about the paper?
3. To what extent is the design of Dynamo inspired by Distributed Hash Tables (DHTs)? What are the advantages and disadvantages of such a design?
4. How does the design of Dynamo compare to that of BigTable?

Week 4:

"Spanner: Google's Globally-Distributed Database", James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, Tenth Symposium on Operating System Design and Implementation (OSDI), Hollywood, CA, October, 2012
1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
2. What is good about the paper? What is not good about the paper?
3. How does the performance of Spanner depend on the workload?
4. What other applications could TrueTime have?

Week 6:

"MapReduce: Simplified Data Processing on Large Clusters", Jeffrey Dean and Sanjay Ghemawat, Sixth Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, December, 2004
1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
2. What is good about the paper? What is not good about the paper?
3. What algorithms cannot be easily expressed in the MapReduce model?
4. Can you think of other techniques for handling stragglers?

"Resilient Distributed Datasets", Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, 9th USENIX conference on Networked Systems Design and Implementation (NSDI), San Jose, CA, April 2012.
1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
2. What is good about the paper? What is not good about the paper?
3. Is the comparison with Hadoop fair?
4. How well can Spark be used to process graph data?

Coursework 1:

"ZooKeeper: Wait-Free Coordination for Internet-Scale Systems", Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed, USENIX Annual Technical Conference (ATC), Boston, MA, 2010

Optional:

"Large-scale cluster managment at Google with Borg" (link)
"Kubernetes - Scheduling the Future at Cloud Scale" (link)

Other information:

If you print the slides, we encourage you to print them with 4 slides per page. You can do this either by selecting "Multiple pages per sheet" in the "Print" dialog box of Acrobat Reader, or by simply typing the following command in Linux:

$ pdfnup --nup 2x2 file.pdf

which generates "file-nup.pdf" with 4 slides per page.

Calendar

Faculty of Engineering

Get in touch

Quick links

Find us on social media