Module aims

To develop an understanding of

* how a compiler for a high-level programming language works,

* how programming language design is influenced by compiler structure,

* how computer architecture is influenced by the needs of compiled programs.

The course provides the specific technical skills needed for constructing parsers, interpreters and translators as well as introducing topics in code optimisation and semantic analysis.

Learning outcomes

Students will have developed by the end of the course sufficient technical skills for constructing parsers, interpreters and translators to be able to build their own compiler from scratch using suitable tools. They will be able to describe how a compiler for a high level programming language works and will have an understanding of how programming language design is influenced by compiler structure and how computer architecture is influenced by the needs of compiled programs.

Module syllabus

Language processors, compilers, interpreters and their relatives. The structure of a compiler. Its context: editors, and loaders. Phases and passes. A complete example in two or three lectures.

Code generation for arithmetic expressions, using a stack and using registers. The importance of effective use of registers. Sethi-Ullman numbers and the graph colouring approach. Code generation for statements, control structures and logical expressions. Short-circuiting logical operators. Dynamic and static instruction counts.

Lexical Analysis: Regular Expressions, Finite Automata, NFAs, DFAs, Regular Expressions to DFAs.

Syntax Analysis: Top Down Parsing, Bottom Up Parsing, Chomsky Hierarchy, Context Free Grammars, Ambiguous Grammars

LR Parsing: Pushdown Automata, LR Parsing Tables, LR(0), SLR, LR(1), LALR(1), First and Follow Sets, Shift-Reduce and Reduce-Reduce Errors, Error Recovery, Abstract Syntax Trees.

LL Parsing: Definition, EBNF, Recursive Descent Parsing, Left Factorisation, Substitution, Left Recursion Removal, Error Recovery, Table- Driven LL(1)Parsing

Semantic Analysis: Classes of semantic checks, Symbol Tables, Check methods for Imperative statements, Type Checking, Single Inheritance, Attribute Grammars

Runtime Organisation: Memory layout, Objects, Method Calls, Inheritance, Overriding, Dynamic Binding, Global/Stack/Heap Data, Explicit Heap Allocation, Garbage Allocation, Reference Counting GC, Mark-Sweep GC, Two- Space GC, Generational GC.

Optimisation: peephole, basic block, inter-block ('global') and interprocedural. Strength reduction. Side-effects and aliasing.

Concrete versus abstract syntax. Interpreters. Name scope: environments; static versus dynamic scoping. Semantic checking: type checking. Translation: decompilation; pretty-printing. Code generation: stack machines; assembly language; run-time stacks and block structure; static and dynamic chains; global variables; register machines. Optimisation: register allocation; constant folding; Boolean expressions; assignment; statements; loops; invariant code. Data structures: arrays; generators; element access; bounds checking; common subexpressions; base and index register use; dope vectors. Syntax analysis: BNF; context-free grammars; recognisers; parse trees. Warshall's algorithm; context clashes; producing abstract syntax trees. Bottom-up parsing; operator precedence; operator grammars; constructing precedence matrices. Lexical analysis: finite-state automata.

Teaching methods

 Lectures and tutorials are used for teaching.


 In addition to the formal exam, student learning is supported through unassessed and assessed tutorial exercises and a term-long lab project.

Reading list



Module leaders

Dr Naranker Dulay
Professor Paul Kelly