Course Calendar

Date

Lecture

Instructor

Topic

Week 2

 

Oct 8, 9:00-11:00

Lectures 1 + 2

Peter Pietzuch

Scalable distributed systems design (slides)

Oct 11, 14:00-16:00

Lectures 3 + 4

Peter Pietzuch

Data centre and cloud computing (slides)

Week 3

     

Oct 15, 9:00-11:00

Lectures 5 + 6

Peter Pietzuch

BigTable (slides)
Required reading: "Bigtable: A Distributed Storage System for Structured Data"

Oct 18, 14:00-16:00

Lectures 7 + 8

Peter Pietzuch

Dynamo (slides)
Required reading: Dynamo: Amazon's Highly Available Key-Value Store 

Week 4

     

Oct 22, 9:00-11:00

Lectures 9 + 10

Peter Pietzuch

Spanner (slides)
Required reading: "Spanner: Google's Globally-Distributed Database"

Oct 25, 14:00-16:00

Lectures 11 + 12

Thomas Heinis

 Introduction & Main Memory Databases (slides, slides)

Week 5

 

Oct 29, 9:00-11:00

Lectures 13 + 14

Thomas Heinis

 Solid State Disk and Databases (slides, slides)

Nov 1, 14:00-16:00

Lecture 15 + 16

Thomas Heinis

 Graph Databases (slides)

Week 6

 

Nov 5, 9:00-11:00

Lecture 17 + 18

Peter Pietzuch MapReduce (slides)
Required reading: "MapReduce: Simplified Data Processing on Large Clusters"
Nov 8, 14:00-16:00 Lecture 19 + 20 Peter Pietzuch Spark (slides)
Required reading: "Resilient Distributed Datasets"
Week 7      
Nov 12, 9:00-11:00 Lecture 21 + 22 Thomas Heinis Document Databases (slides, slides) & XQuery/XPath slides (not examinable)
Nov 15, 14:00-16:00 Lecture 23 + 24 Thomas Heinis  
Graph & Document Database Tutorial
Week 8    

 

Nov 19, 9:00-11:00

Lecture 25 + 26 Thomas Heinis  Transactions on Multicores (slides)
Nov 22, 14:00-16:00 Lecture 27 + 28 Thomas Heinis  Cold Storage (slides, slides)
Week 9      
Nov 26, 09:00-11:00 No lecture    
Nov 29, 14:00-16:00

No lecture

   
       

Reading and discussion materials:

Week 3:

  • "Bigtable: A Distributed Storage System for Structured Data", Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Seventh Symposium on Operating System Design and Implementation (OSDI), Seattle, WA, November, 2006
    1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
    2. What is good about the paper? What is not good about the paper?
    3. How does the design of BigTable compare to that of a parallel relational database management system (RDBMS)?
    4. What limits the scalability of the BigTable design?
  • "Dynamo: Amazon's Highly Available Key-Value Store", Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels, ACM Symposium on Operating Systems Principles (SOSP), Stevenson, WA, October 2007
    1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
    2. What is good about the paper? What is not good about the paper?
    3. To what extent is the design of Dynamo inspired by Distributed Hash Tables (DHTs)? What are the advantages and disadvantages of such a design?
    4. How does the design of Dynamo compare to that of BigTable?

Week 4:

  • "Spanner: Google's Globally-Distributed Database", James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, Tenth Symposium on Operating System Design and Implementation (OSDI), Hollywood, CA, October, 2012
    1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
    2. What is good about the paper? What is not good about the paper?
    3. How does the performance of Spanner depend on the workload?
    4. What other applications could TrueTime have?

Week 6:

  • "MapReduce: Simplified Data Processing on Large Clusters", Jeffrey Dean and Sanjay Ghemawat, Sixth Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, December, 2004
    1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
    2. What is good about the paper? What is not good about the paper?
    3. What algorithms cannot be easily expressed in the MapReduce model?
    4. Can you think of other techniques for handling stragglers?
  • "Resilient Distributed Datasets", Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, 9th USENIX conference on Networked Systems Design and Implementation (NSDI), San Jose, CA, April 2012.
    1. What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
    2. What is good about the paper? What is not good about the paper?
    3. Is the comparison with Hadoop fair?
    4. How well can Spark be used to process graph data?

Coursework 1:

Optional:

  • "Large-scale cluster managment at Google with Borg" (link)
  • "Kubernetes - Scheduling the Future at Cloud Scale" (link)

Other information:

If you print the slides, we encourage you to print them with 4 slides per page.  You can do this either by selecting "Multiple pages per sheet" in the "Print" dialog box of Acrobat Reader, or by simply typing the following command in Linux:

$ pdfnup --nup 2x2 file.pdf

which generates "file-nup.pdf" with 4 slides per page.