Imperial College London

ProfessorAlastairDonaldson

Faculty of EngineeringDepartment of Computing

Professor of Programming Languages
 
 
 
//

Contact

 

+44 (0)20 7594 8266alastair.donaldson Website

 
 
//

Location

 

422Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

99 results found

Donaldson A, Thomson P, Teliman V, Milizia S, Perez Maselco A, KarpiƄski Aet al., 2021, Test-case reduction and deduplication almost for free with transformation-based compiler testing, PLDI 2021, Publisher: Association for Computing Machinery (ACM)

Recent transformation-based approaches to compiler testing look for mismatches between the results of pairs of equivalent programs, where one program is derived from the other by randomly applying semantics-preserving transformations. We present a formulation of transformation-based compiler testing that provides effective test-case reduction almost for free: if transformations are designed to be as small and independent as possible, standard delta debugging can be used to shrink a bug-inducing transformation sequence to a smaller subsequence that still triggers the bug. The bug can then be reported as a delta between an original and minimally-transformed program. Minimized transformation sequences can also be used to heuristically deduplicate a set of bug-inducing tests, recommending manual investigation of those that involve disparate types of transformations and thus may have different root causes. We demonstrate the effectiveness of our approach via a new tool, spirv-fuzz, the first compiler-testing tool for the SPIR-V intermediate representation that underpins the Vulkan GPU programming model.

Conference paper

Lascu A, Windsor M, Donaldson A, Grosser T, Wickerson Jet al., 2021, Dreaming up metamorphic relations: experiences from three fuzzer tools, 2021 IEEE/ACM 6th International Workshop on Metamorphic Testing (MET), Publisher: IEEE

Metamorphic testing requires the availability of a suitable set of metamorphic relations (MRs) for the application domain of interest. A software testing practitioner interested in using metamorphic testing is thus blocked unless they can devise a suitable set of MRs. In this paper we offer some practical advice on sources of inspiration for MRs, based on our experience building three fuzzing tools based on metamorphic testing: MF++, which supports automated testing of C++11libraries,C4, which tests concurrency support in C11 compilers, and spirv-fuzz, which aims to find bugs in compilers for the SPIR-V programming language (mainly used in computer graphics).The MRs we have devised have taken inspiration from three main sources: (1) careful study of specification documents related to the libraries and programming languages that these tools target, (2) consultation of prior work and discussion with domain experts, and (3) manual inspection of the results of automated code coverage analysis on the systems under test. We describe these sources of inspiration in detail, giving a range of concrete examples for each. We hope that this experience report will help to inform developers of future metamorphic testing tools as to the steps they can take to discover MRs in their domains of interest.

Conference paper

Even-Mendoza K, Cadar C, Donaldson A, Even-Mendoza K, Cadar C, Donaldson Aet al., 2020, Closer to the edge: testing compilers more thoroughly by being less conservative about undefined behaviour, IEEE/ACM International Conference on Automated Software Engineering, New Ideas and Emerging Results Track (ASE-NIER 2020), Publisher: IEEE / ACM, Pages: 1219-1223

Randomised compiler testing techniques require a means of generating programs that are free from undefined behaviour (UB) in order to reliably reveal miscompilation bugs. Existing program generators such asCsmithheavily restrict the form of generated programs inorder to achieve UB-freedom. We hypothesise that the idiomatic nature of such programs limits the test coverage they can offer. Our idea is to generate less restricted programs that are still UB-free—programs that get closer to the edge of UB, but that do not quite cross the edge. We present preliminary support for our idea via a prototype tool, CsmithEdge, which uses simple dynamic analysis to determine whereCsmithhas been too conservative in its use of safe math rappers that guarantee UB-freedom for arithmetic operations. By eliminating redundant wrappers,CsmithEdge was able to discover two new miscompilation bugs in GCC that could not be found via intensive testing using regular Csmith, and to achieve substantial differences in code coverage on GCC compared with regular Csmith.

Conference paper

Donaldson AF, Torlak E, 2020, Message from the Chairs, Pages: III-IV

Conference paper

Iorga D, Sorensen T, Wickerson J, Donaldson Aet al., 2020, Slow and steady: measuring and tuning multicore interference, IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), Publisher: IEEE

Now ubiquitous, multicore processors provide repli-cated compute cores that allow independent programs to runin parallel. However, shared resources, such as last-level caches,can cause otherwise-independent programs to interfere with oneanother, leading to significant and unpredictable effects on theirexecution time. Indeed, prior work has shown that speciallycraftedenemy programscan cause software systems of interest toexperience orders-of-magnitude slowdowns when both are run inparallel on a multicore processor. This undermines the suitabilityof these processors for tasks that have real-time constraints.In this work, we explore the design and evaluation of tech-niques for empirically testing interference using enemy programs,with an eye towards reliability (how reproducible the interferenceresults are) and portability (how interference testing can beeffective across chips). We first show that different methodsof measurement yield significantly different magnitudes of, andvariation in, observed interference effects when applied to anenemy process that was shown to be particularly effective inprior work. We propose a method of measurement based onpercentiles and confidence intervals, and show that it providesboth competitive and reproducible observations. The reliability ofour measurements allows us to exploreauto-tuning, where enemyprograms are further specialised per architecture. We evaluatethree different tuning approaches (random search, simulatedannealing, and Bayesian optimisation) on five different multicorechips, spanning x86 and ARM architectures. To show that ourtuned enemy programs generalise to applications, we evaluatethe slowdowns caused by our approach on the AutoBench andCoreMark benchmark suites. Our method achieves a statisticallylarger slowdown compared to prior work in 35 out of 105benchmark/chip combination

Conference paper

Marcozzi M, Tang Q, Donaldson A, Cadar Cet al., 2019, Compiler fuzzing: how much does it matter?, Proceedings of the ACM on Programming Languages, Vol: 3, Pages: 155:1-155:29, ISSN: 2475-1421

Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-foundcompiler bugs on real-world applications has barely been assessed. We present the first quantitative andqualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorousmethodology where the bug impact over the compiled application is evaluated based on (1) whether the bugappears to trigger during compilation; (2) the extent to which generated assembly code changes syntacticallydue to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whetherwe can manually find application inputs that trigger execution divergence due to such changes. The studyis conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debianpackages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzersin the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-productof formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to thegenerated binaries for at least one package, in which case only a very small part of the binary is typicallyaffected, yet causing two failures when running the test suites of all the impacted packages. User-reportedand formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one testfailure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-foundand non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact orthat they would require very specific runtime circumstances to trigger execution divergence.

Journal article

Marcozzi M, Tang Q, Cadar C, Donaldson Aet al., 2019, Compiler Fuzzing: How Much Does It Matter

Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether we can manually find application inputs that trigger execution divergence due to such changes. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for at least one package, in which case only a very small part of the binary is typically affected, yet causing two failures when running the test suites of all the impacted packages. User-reported and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or that they would require very specific runtime circumstances to trigger execution divergence.

Software

Marcozzi M, Tang Q, Cadar C, Donaldson Aet al., 2019, Compiler Fuzzing: How Much Does It Matter?

Despite much recent interest in compiler randomized testing (fuzzing), the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) how much such changes do cause regression test suite failures and could be used to manually trigger divergences during execution. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or by formal verification. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for some packages, but rarely affect their syntax and cause two failures in total when running their test suites. User-reported and formal verification bugs do not exhibit a higher impact, with less frequently triggered bugs and one test failure. Our manual analysis of a selection of bugs, either fuzzer-found or not, suggests that none can easily trigger a runtime divergence on the packages considered in the analysis, and that in general they affect only corner cases.

Software

Liew D, Cadar C, Donaldson A, Stinnett JRet al., 2019, Just fuzz it: solving floating-point constraints using coverage-guided fuzzing, ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’19), Publisher: ACM, Pages: 521-532

We investigate the use of coverage-guided fuzzing as a means ofproving satisfiability of SMT formulas over finite variable domains,with specific application to floating-point constraints. We show howan SMT formula can be encoded as a program containing a locationthat is reachable if and only if the program’s input corresponds toa satisfying assignment to the formula. A coverage-guided fuzzercan then be used to search for an input that reaches the location,yielding a satisfying assignment. We have implemented this ideain a tool,JustFuzz-itSolver (JFS), and we present a large experi-mental evaluation showing that JFS is both competitive with andcomplementary to state-of-the-art SMT solvers with respect tosolving floating-point constraints, and that the coverage-guidedapproach of JFS provides significant benefit over naive fuzzing inthe floating-point domain. Applied in a portfolio manner, the JFS approach thus has the potential to complement traditional SMTsolvers for program analysis tasks that involve reasoning aboutfloating-point constraints.

Conference paper

Lidbury C, Donaldson AF, 2019, Sparse record and replay with controlled scheduling, the 40th ACM SIGPLAN Conference, Publisher: ACM Press, Pages: 576-593

Modern applications include many sources of nondeterminism, e.g. due to concurrency, signals, and system calls that interact with the external environment. Finding and reproducing bugs in the presence of this nondeterminism has been the subject of much prior work in three main areas: (1) controlled concurrency-testing, where a custom scheduler replaces the OS scheduler to find subtle bugs; (2) record and replay, where sources of nondeterminism are captured and logged so that a failing execution can be replayed for debugging purposes; and (3) dynamic analysis for the detection of data races. We present a dynamic analysis tool for C++ applications, tsan11rec, which brings these strands of work together by integrating controlled concurrency testing and record and replay into the tsan11 framework for C++11 data race detection. Our novel twist on record and replay is a sparse approach, where the sources of nondeterminism to record can be configured per application. We show that our approach is effective at finding subtle concurrency bugs in small applications; is competitive in terms of performance with the state-of-the-art record and replay tool rr on larger applications; succeeds (due to our sparse approach) in replaying the I/O-intensive Zandronum and QuakeSpasm video games, which are out of scope for rr; but (due to limitations of our sparse approach) cannot faithfully replay applications where memory layout nondeterminism significantly affects application behaviour.

Conference paper

Sorensen T, Pai S, Donaldson AF, 2019, One Size Doesn't Fit All: Quantifying Performance Portability of Graph Applications on GPUs, 15th IEEE International Symposium on Workload Characterization (IISWC), Publisher: IEEE, Pages: 155-166

Conference paper

Sorensen T, Pai S, Donaldson AF, 2019, Performance Evaluation of OpenCL Standard Support (and Beyond), 8th International Workshop on OpenCL (IWOCL), Publisher: ASSOC COMPUTING MACHINERY

Conference paper

Sorensen T, Evrard H, Donaldson AF, 2018, GPU schedulers: How fair is fair enough?, 29th International Conference on Concurrency Theory (CONCUR 2018), Publisher: Schloss Dagstuhl - Leibniz-Zentrum für Informatik, ISSN: 1868-8969

Blocking synchronisation idioms, e.g. mutexes and barriers, play an important role in concurrent programming. However, systems with semi-fair schedulers, e.g. graphics processing units (GPUs), are becoming increasingly common. Such schedulers provide varying degrees of fairness, guaranteeing enough to allow some, but not all, blocking idioms. While a number of applications that use blocking idioms do run on today’s GPUs, reasoning about liveness properties of such applications is di cult as documentation is scarce and scattered. In this work, we aim to clarify fairness properties of semi-fair schedulers. To do this, we define a general temporal logic formula, based on weak fairness, parameterised by a predicate that enables fairness per-thread at certain points of an execution. We then define fairness properties for three GPU schedulers: HSA, OpenCL, and occupancy-bound execution. We examine existing GPU applications and show that none of the above schedulers are strong enough to provide the fairness properties required by these applications. It hence appears that existing GPU scheduler descriptions do not entirely capture the fairness properties that are provided on current GPUs. Thus, we present two new schedulers that aim to support existing GPU applications. We analyse the behaviour of common blocking idioms under each scheduler and show that one of our new schedulers allows a more natural implementation of a GPU protocol.

Conference paper

Ketema J, Donaldson AF, 2017, Termination analysis for GPU kernels, Science of Computer Programming, Vol: 148, Pages: 107-122, ISSN: 0167-6423

We describe a thread-modular technique for proving termination of massively parallel GPU kernels. The technique reduces the termination problem for these kernels to a sequential termination problem by abstracting the shared state, and as such allows us to leverage termination analysis techniques for sequential programs. An implementation in KITTeL is able to show termination of 94% of 604 kernels collected from various sources.

Journal article

Donaldson AF, Evrard H, Lascu A, Thomson Pet al., 2017, Automated testing of graphics shader compilers, Object-oriented Programming, Systems, Languages, and Applications (OOPSLA), Publisher: ACM, Pages: 1-27, ISSN: 2475-1421

We present an automated technique for finding defects in compilers for graphics shading languages. key challenge in compiler testing is the lack of an oracle that classifies an output as correct or incorrect; this is particularly pertinent in graphics shader compilers where the output is a rendered image that is typically under-specified. Our method builds on recent successful techniques for compiler validation based on metamorphic testing, and leverages existing high-value graphics shaders to create sets of transformed shaders that should be semantically equivalent. Rendering mismatches are then indicative of shader compilation bugs. Deviant shaders are automatically minimized to identify, in each case, a minimal change to an original high-value shader that induces a shader compiler bug. We have implemented the approach as a tool, GLFuzz, targeting the OpenGL shading language, GLSL. Our experiments over a set of 17 GPU and driver configurations, spanning the main 7 GPU designers, have led to us finding and reporting more than 60 distinct bugs, covering all tested configurations. As well as defective rendering, these issues identify security-critical vulnerabilities that affect WebGL, including a significant remote information leak security bug where a malicious web page can capture the contents of other browser tabs, and a bug whereby visiting a malicious web page can lead to a ``blue screen of death'' under Windows 10. Our findings show that shader compiler defects are prevalent, and that metamorphic testing provides an effective means for detecting them automatically.

Conference paper

Donaldson AF, Ketema J, Sorensen T, Wickerson Jet al., 2017, Forward progress on GPU concurrency, 28th International Conference on Concurrency Theory, Publisher: Schloss Dagstuhl - Leibniz-Zentrum für Informatik, ISSN: 1868-8969

The tutorial at CONCUR will provide a practical overview of work undertaken over the last six years in the Multicore Programming Group at Imperial College London, and with collaborators internationally, related to understanding and reasoning about concurrency in software designed for acceleration on GPUs. In this article we provide an overview of this work, which includes contributions to data race analysis, compiler testing, memory model understanding and formalisation, and most recently efforts to enable portable GPU implementations of algorithms that require forward progress guarantees.

Conference paper

Sorensen T, Evrard H, Donaldson AF, 2017, Cooperative kernels: GPU multitasking for blocking algorithms, 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Publisher: ACM

There is growing interest in accelerating irregular data-parallelalgorithms on GPUs. These algorithms are typicallyblocking, sothey require fair scheduling. But GPU programming models (e.g.OpenCL) do not mandate fair scheduling, and GPU schedulers areunfair in practice. Current approaches avoid this issue by exploit-ing scheduling quirks of today’s GPUs in a manner that does notallow the GPU to be shared with other workloads (such as graphicsrendering tasks). We proposecooperative kernels, an extension tothe traditional GPU programming model geared towards writingblocking algorithms. Workgroups of a cooperative kernelarefairlyscheduled, and multitasking is supported via a small set of languageextensions through which the kernel and scheduler cooperate. Wedescribe a prototype implementation of a cooperative kernel frame-work implemented in OpenCL 2.0 and evaluate our approach byporting a set of blocking GPU applications to cooperative kernelsand examining their performance under multitasking.

Conference paper

Liew D, Schemmel D, Cadar C, Donaldson A, Zähl R, Wehrle Ret al., 2017, Floating-Point Symbolic Execution: A Case Study in N-version Programming, IEEE/ACM International Conference on Automated Software Engineering (ASE 2017), Publisher: IEEE

Conference paper

Betts A, Chong N, Deligiannis P, Donaldson AF, Ketema Jet al., 2017, Implementing and evaluating candidate-based invariant generation, IEEE Transactions on Software Engineering, Vol: 44, Pages: 631-650, ISSN: 1939-3520

The discovery of inductive invariants lies at the heart of static program verification. Presently, many automatic solutions toinductive invariant generation are inflexible, only applicable to certain classes of programs, or unpredictable. An automatic techniquethat circumvents these deficiencies to some extent iscandidate-based invariant generation, whereby a large number of candidateinvariants are guessed and then proven to be inductive or rejected using a sound program analyser. This paper describes our efforts toapply candidate-based invariant generation in GPUVerify, a static checker of programs that run on GPUs. We study a set of383GPUprograms that contain loops, drawn from a number of open source suites and vendor SDKs. Among this set,253benchmarks requireprovision of loop invariants for verification to succeed.We describe the methodology we used to incrementally improve the invariant generation capabilities of GPUVerify to handle thesebenchmarks, throughcandidate-based invariant generation, whereby potential program invariants are speculated using cheap staticanalysis and subsequently either refuted or proven. We also describe a set of experiments that we used to examine the effectiveness ofour rules for candidate generation, assessing rules based on theirgenerality(the extent to which they generate candidate invariants),hitrate(the extent to which the generated candidates hold),effectiveness(the extent to which provable candidates actually help in allowingverification to succeed), andinfluence(the extent to which the success of one generation rule depends on candidates generated byanother rule). We believe that our methodology for devising and evaluation candidate generation rules may serve as a useful frameworkfor other researchers interested in candidate-based invariant generation.The candidates produced by GPUVerify help to verify231of these253programs. An increase in precision, however, has createdsluggishness in GPUVerify because more candidates are gen

Journal article

Magron V, Constantinides G, Donaldson AF, Magron V, Constantinides GA, Donaldson A, Magron V, Constantinides GA, Donaldson AFet al., 2017, Certified roundoff error bounds using semidefinite programming, ACM Transactions on Mathematical Software, Vol: 43, Pages: 1-31, ISSN: 0098-3500

Roundoff errors cannot be avoided when implementing numerical programs with finite precision. The ability to reason about rounding is especially important if one wants to explore a range of potential representations, for instance, for FPGAs or custom hardware implementations. This problem becomes challenging when the program does not employ solely linear operations as non-linearities are inherent to many interesting computational problems in real-world applications.Existing solutions to reasoning possibly lead to either inaccurate bounds or high analysis time in the presence of nonlinear correlations between variables. Furthermore, while it is easy to implement a straightforward method such as interval arithmetic, sophisticated techniques are less straightforward to implement in a formal setting. Thus there is a need for methods that output certificates that can be formally validated inside a proof assistant.We present a framework to provide upper bounds on absolute roundoff errors of floating-point nonlinear programs. This framework is based on optimization techniques employing semidefinite programming and sums of squares certificates, which can be checked inside the Coq theorem prover to provide formal roundoff error bounds for polynomial programs. Our tool covers a wide range of nonlinear programs, including polynomials and transcendental operations as well as conditional statements. We illustrate the efficiency and precision of this tool on non-trivial programs coming from biology, optimization, and space control. Our tool produces more accurate error bounds for 23% of all programs and yields better performance in 66% of all programs.

Journal article

Lidbury C, Donaldson AF, 2017, Dynamic race detection for C++11, ACM SIGPLAN Symposium on Principles of Programming Languages, Publisher: ACM

The intricate rules for memory ordering and synchronisation associated with the C/C++11 memory model mean that data races can be difficult to eliminate from concurrent programs. Dynamic data race analysis can pinpoint races in large and complex applications, but the state-of-the-art ThreadSanitizer (tsan) tool for C/C++ considers only sequentially consistent program executions, and does not correctly model synchronisation between C/C++11 atomic operations. We present a scalable dynamic data race analysis for C/C++11 that correctly captures C/C++11 synchronisation, and uses instrumentation to support exploration of a class of non sequentially consistent executions. We concisely define the memory model fragment captured by our instrumentation via a restricted axiomatic semantics, and show that the axiomatic semantics permits exactly those executions explored by our instrumentation. We have implemented our analysis in tsan, and evaluate its effectiveness on benchmark programs, enabling a comparison with the CDSChecker tool, and on two large and highly concurrent applications: the Firefox and Chromium web browsers. Our results show that our method can detect races that are beyond the scope of the original tsan tool, and that the overhead associated with applying our enhanced instrumentation to large applications is tolerable.

Conference paper

Donaldson AF, Gopalakrishnan G, Chong N, Ketema J, Li G, Li P, Lokhmotov A, Qadeer Set al., 2017, Formal analysis techniques for reliable GPU programming: Current solutions and call to action, Advances in GPU Research and Practice, Pages: 3-21, ISBN: 9780128037386

Graphics processing units (GPU)-accelerated computing is being adopted increasingly in a number of areas, ranging from high-end scientific computing to mobile and embedded computing. While GPU programs routinely provide high computational throughput in a number of areas, they also prove to be notoriously difficult to write and optimize correctly, largely because of the subtleties of GPU concurrency. This chapter discusses several issues that make GPU programming difficult and examines recent progress on rigorous methods for formal analysis of GPU software. Our key observation is that given the fast-paced advances in GPU programming, the use of rigorous specification and verification methods must be an integral part of the culture of programming and training, and not an afterthought.

Book chapter

Sorensen T, Donaldson AF, Batty M, Gopalakrishnan G, Rakamaric Zet al., 2016, Portable Inter-workgroup Barrier Synchronisation for GPUs, The 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Publisher: ACM, Pages: 39-58

Despite the growing popularity of GPGPU programming,there is not yet a portable and formally-specified barrierthat one can use to synchronise across workgroups. Moreover,the occupancy-bound execution model of GPUs breaksassumptions inherent in traditional software execution barriers,exposing them to deadlock. We present an occupancydiscovery protocol that dynamically discovers a safe estimateof the occupancy for a given GPU and kernel, allowingfor a starvation-free (and hence, deadlock-free) interworkgroupbarrier by restricting the number of workgroupsaccording to this estimate. We implement this idea byadapting an existing, previously non-portable, GPU interworkgroupbarrier to use OpenCL 2.0 atomic operations,and prove that the barrier meets its natural specification interms of synchronisation.We assess the portability of our approach over eightGPUs spanning four vendors, comparing the performanceof our method against alternative methods. Our key findingsinclude: (1) the recall of our discovery protocol isnearly 100%; (2) runtime comparisons vary substantiallyacross GPUs and applications; and (3) our method providesportable and safe inter-workgroup synchronisation acrossthe applications we study.

Conference paper

Liew D, Cadar C, Donaldson A, 2016, Symbooglix: A symbolic execution engine for boogie programs, IEEE International Conference on Software Testing, Verification, and Validation, Publisher: IEEE

We present the design and implementation of Sym-booglix, a symbolic execution engine for the Boogie intermediateverification language. Symbooglix aims to find bugs in Boogieprograms efficiently, providing bug-finding capabilities for anyprogram analysis framework that uses Boogie as a targetlanguage. We discuss the technical challenges associated withhandling Boogie, and describe how we optimised Symbooglixusing a small training set of benchmarks. This empirically-driven optimisation approach avoids over-fitting Symbooglix toour benchmarks, enabling a fair comparison with other tools.We present an evaluation across 3749 Boogie programs generatedfrom the SV-COMP suite of C programs using the SMACK front-end, and 579 Boogie programs originating from several OpenCLand CUDA GPU benchmark suites, translated by the GPUVerifyfront-end. Our results show that Symbooglix significantly out-performs Boogaloo, an existing symbolic execution tool for Boogie,and is competitive with GPUVerify on benchmarks for whichGPUVerify is highly optimised. While generally less effective thanthe Corral and Duality tools on the SV-COMP suite, Symbooglixis complementary to them in terms of bug-finding ability.

Conference paper

Sorensen T, Donaldson AF, 2016, Exposing errors related to weak memory in GPU applications, 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, Publisher: ACM, Pages: 100-113

We present the systematic design of a testing environment that uses stressing and fuzzing to reveal errors in GPU applications that arise due to weak memory effects. We evaluate our approach on seven GPUs spanning three Nvidia architectures, across ten CUDA applications that use fine-grained concurrency. Our results show that applications that rarely or never exhibit errors related to weak memory when executed natively can readily exhibit these errors when executed in our testing environment. Our testing environment also provides a means to help identify the root causes of such errors, and automatically suggests how to insert fences that harden an application against weak memory bugs. To understand the cost of GPU fences, we benchmark applications with fences provided by the hardening strategy as well as a more conservative, sound fencing strategy.

Conference paper

Cadar C, Donaldson A, 2016, Analysing the Program Analyser, International Conference on Software Engineering, Visions of 2025 and Beyond Track, Publisher: ACM

It is the year 2025. FutureCorp, the privatesector defence contractor to whom the US government hasoutsourced its management of nuclear weapons, has justhad its missile control software hijacked by terrorists. Itis only a matter of hours before Armageddon. The CEOof FutureCorp, Dr F. Methods, is incredulous: “This isimpossible”, he told one of our reporters. “We used programanalysis to formally prove that the software was secure!”.“Ah, Dr Methods,” responded open-source developer Mr B.Door, “But did you check the analyser?”

Conference paper

Donaldson AF, Lascu A, 2016, Metamorphic testing for (graphics) compilers, 1st IEEE/ACM International Workshop on Metamorphic Testing (MET), Publisher: ACM, Pages: 44-47

We present strategies for metamorphic testing of compil-ers using opaque value injection, and experiences using themethod to test compilers for the OpenGL shading language.

Conference paper

Sorensen T, Donaldson AF, 2016, The Hitchhiker's Guide to Cross-Platform OpenCL Application Development, 4th International Workshop on OpenCL (IWOCL '16), Publisher: ACM

One of the benefits to programming of OpenCL is platform portability. That is, an OpenCL program that follows the OpenCL specification should, in principle, execute reliably on any platform that supports OpenCL. To assess the current state of OpenCL portability, we provide an experience report examining two sets of open source benchmarks that we attempted to execute across a variety of GPU platforms, via OpenCL. We report on the portability issues we encountered, where applications would execute successfully on one platform but fail on another. We classify issues into three groups: (1) framework bugs, where the vendor-provided OpenCL framework fails; (2) specification limitations, where the OpenCL specification is unclear and where different GPU platforms exhibit different behaviours; and (3) programming bugs, where non-portability arises due to the program exercising behaviours that are incorrect or undefined according to the OpenCL specification. The issues we encountered slowed the development process associated with our sets of applications, but we view the issues as providing exciting motivation for future testing and verification efforts to improve the state of OpenCL portability; we conclude with a discussion of these.

Conference paper

Pflanzer M, Donaldson AF, Lascu A, 2016, Automatic Test Case Reduction for OpenCL, 4th International Workshop on OpenCL (IWOCL '16), Publisher: ACM

We report on an extension to the C-Reduce tool, for automatic reduction of C test cases, to handle OpenCL kernels. This enables an automated method for detecting bugs in OpenCL compilers, by generating large random kernels using the CLsmith generator, identifying kernels that yield result differences across OpenCL platforms and optimisation levels, and using our novel extension to C-Reduce to automatically reduce such kernels to minimal forms that can be filed as bug reports. A major part of our effort involved the design of ShadowKeeper, a new plugin for the Oclgrind simulator that provides accurate detection of accesses to uninitialised data. We present experimental results showing the effectiveness of our method for finding bugs in a number of OpenCL compilers.

Conference paper

Batty M, Donaldson A, Wickerson J, 2016, Overhauling SC atomics in C11 and OpenCL, ACM SIGPLAN Notices, Vol: 51, Pages: 634-648, ISSN: 0362-1340

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00709118&limit=30&person=true