Advanced Computer Architecture
This a third-level course that aims to develop a thorough understanding of high-performance and energy-efficient computer architecture, as a basis for informed software performance engineering and as a foundation for advanced work in computer architecture, compiler design, operating systems and parallel processing.
Students should be able to:
* Explain performance behaviour of programs in terms of architectural features of the computer system, including processor microarchitecture, instruction scheduling, branch prediction, instruction- and thread-level parallelism, the memory hierarchy and interconnection network.
* Discuss computer architecture design alternatives with respect to performance, energy efficiency, benchmarking methodology.
* Show understanding of the role of compilers, parallelisation and parallel programming models in matching application requirements to the capabilities of the architecture.
* Pipelined CPU architecture. Instruction set design and pipeline structure. Dynamic scheduling using scoreboarding and Tomasulo's algorithm; register renaming. Software instruction scheduling and software pipelining. Superscalar and long-instruction-word architectures (VLIW, EPIC and Itanium). Branch prediction and speculative execution.
Simultaneous multithreading ('hyperthreading'), and vector instruction sets (such as SSE and AVX).
* Caches: associativity, allocation and replacement policies, sub-block placement. Multilevel caches, multilevel inclusion. Cache performance issues. Uniprocessor cache coherency issues: self-modifying code, peripherals, address translation.
* Dependence in loop-based programs; dependence analysis, and iteration-space transformations - to enable automatic parallelisation, vectorisation (eg for AVX), and for memory hierarchy optimisations such as tiling.
* Implementations of shared memory: the cache coherency problem. Update vs invalidation. The bus-based 'snooping' protocol design space.
Scalable shared memory using directory-based cache coherency. How shared memory supports programmability; OpenMP and MPI.
* Graphics processors and 'manycore' architectures: SIMT ('single instruction multiple thread), and the CUDA and OpenCL programming models. Decoupling, latency tolerance, throughput-intensive memory system architecture. The relationships between SIMT graphics processor architecture and more conventional multicore SIMD designs.
Further details are available from the course web site, http://www.doc.ic.ac.uk/~phjk/AdvancedCompArchitecture.html
The contents of (210) Architecture II. Students lacking formal prerequisites should discuss the suitability of the course with the lecturer, and review the online course materials.
Lectures, classroom discussion/debate, question-spotting sessions, and hands-on experimental work.
*This is a level 6/H course
The exam for this course is based to some extent on a recent article describing a new processor architecture - see the course website and past papers for details. The course also includes laboratory-based practical work on optimising processor microarchitecture for energy efficiency, and on optimising a real-world application kernel using appropriate tools and techniques.