Advanced Computer Architecture

Module aims

This a third-level course that aims to develop a thorough understanding of high-performance and energy-efficient computer architecture, as a basis for informed software performance engineering and as a foundation for advanced work in computer architecture, compiler design, operating systems and parallel processing.

Learning outcomes

Students should be able to:

* Explain performance behaviour of programs in terms of architectural features of the computer system, including processor microarchitecture, instruction scheduling, branch prediction, instruction- and thread-level parallelism, the memory hierarchy and interconnection network.

* Discuss computer architecture design alternatives with respect to performance, energy efficiency, benchmarking methodology.

* Show understanding of the role of compilers, parallelisation and parallel programming models in matching application requirements to the capabilities of the architecture.

Module syllabus

Topics include:

* Pipelined CPU architecture. Instruction set design and pipeline structure. Dynamic scheduling using scoreboarding and Tomasulo's algorithm; register renaming. Software instruction scheduling and software pipelining. Superscalar and long-instruction-word architectures (VLIW, EPIC and Itanium). Branch prediction and speculative execution.
Simultaneous multithreading ('hyperthreading'), and vector instruction sets (such as SSE and AVX).

* Caches: associativity, allocation and replacement policies, sub-block placement. Multilevel caches, multilevel inclusion. Cache performance issues. Uniprocessor cache coherency issues: self-modifying code, peripherals, address translation.

* Dependence in loop-based programs; dependence analysis, and iteration-space transformations - to enable automatic parallelisation, vectorisation (eg for AVX), and for memory hierarchy optimisations such as tiling.

* Implementations of shared memory: the cache coherency problem. Update vs invalidation. The bus-based 'snooping' protocol design space.
Scalable shared memory using directory-based cache coherency. How shared memory supports programmability; OpenMP and MPI.

* Graphics processors and 'manycore' architectures: SIMT ('single instruction multiple thread), and the CUDA and OpenCL programming models. Decoupling, latency tolerance, throughput-intensive memory system architecture.  The relationships between SIMT graphics processor architecture and more conventional multicore SIMD designs.

Further details are available from the course web site, http://www.doc.ic.ac.uk/~phjk/AdvancedCompArchitecture.html

Pre-requisites

The contents of (210) Architecture II.  Students lacking formal prerequisites should discuss the suitability of the course with the lecturer, and review the online course materials.

Teaching methods

 Lectures, classroom discussion/debate, question-spotting sessions, and hands-on experimental work.

Assessments

*This is a level 6/H course

The exam for this course is based to some extent on a recent article describing a new processor architecture - see the course website and past papers for details.  The course also includes laboratory-based practical work on optimising processor microarchitecture for energy efficiency, and on optimising a real-world application kernel using appropriate tools and techniques.

Module leaders

Professor Paul Kelly