Advanced Databases

Module aims

  • To provide detailed theoretical and practical knowledge of how database management systems (DBMS) are programmed in SQL, how SQL queries are optimised, and how DBMSs may be linked to form distributed databases, and how DBMSs operate and are tuned to improve performance.
  • To provide exposure to how core concepts in databases may be applied and developed to solve research problems such as handling Big Data and Temporal Data.

Learning outcomes

1.To understand the logical semantics of SQL queries, and how SQL is a implementation of the relational algebra, and how SQL queries may be rewritten and executed in different operational ways whilst preserving their logic semantics.
2.To understand the fundamental tradeoffs in data  management applications (bandwidth, CPU efficiency, flexibility,  throughput, latency, etc) as well as the degrees of freedom that can  be exploited to implement different points in the tradeoff space.
3.To understand how distributed databases are implemented, and how applications can be designed for those distributed databases, scaling up to Big Data sized databases.
4.To understand how concurrency is controlled in single server and distributed databases, and the tradeoff between full concurrency control and query throughput.
5.To understand how to query data using US logic (the logic of until and since), and hence how the relational algebra can be extended to handle new data modelling requirements.

Module syllabus

  • Object-Relational mapping
  • Data Storage
    • tables
    • primary indices
    • pages
    • decomposed vs. n-ary storage
  • Querying data: a deep discussion of SQL in the context of analytical and transactional processing
  • Join formulation and evaluation
  • Query planing and optimization
    • logical
    • physical
  • Processing models
  • Secondary storage
    • secondary indices
    • materialized views
  • Distributed and Scalable Databases
    • Horizontal/vertical fragmentation, data replication
    • Basic distributed query processing
    • Semi-join query processing
    • Big Data: concepts and alternative technologies
      • Map Reduce
      • Pig Latin
  • Transactions and Concurrency Control
    • Recoverability and serialisability of transactions
    • Anomalies in transactions
    • SQL isolation levels
    • Two-phased locking, and requirements to avoid all types of anomalies
    • Distributed concurrency control
  • Temporal Databases
    • Valid time and transaction time
    • Discrete bounded linear flow of time and the temporal structure
    • US logic and derived modal operators
    • Representation of US logic as a temporal relational algebra

Pre-requisites

The contents of CO526 Databases or CO130 Databases I

Teaching methods

Four hours per week of combined lectures and tutorials

Assessments

2 courseworks and one 2 hour exam

Reading list

Module leaders

Dr Holger Pirk
Dr Peter McBrien