Imperial College London

ProfessorPeterPietzuch

Faculty of EngineeringDepartment of Computing

Professor of Distributed Systems
 
 
 
//

Contact

 

+44 (0)20 7594 8314prp Website

 
 
//

Location

 

442Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

135 results found

Mai L, Li G, Wagenlander M, Fertakis K, Brabete A-O, Pietzuch Pet al., 2020, KungFu: Making Training in Distributed Machine Learning Adaptive, USENIX Symposium on Operating Systems Design and Implementation (OSDI)

Conference paper

Theodorakis G, Koliousis A, Pietzuch P, Pirk Het al., 2020, LightSaber: efficient window aggregation on multi-core processors, Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, Publisher: ACM, Pages: 1-17

Window aggregation queries are a core part of streaming applications. To support window aggregation efficiently, stream processing engines face a trade-off between exploiting parallelism (at the instruction/multi-core levels) and incremental computation (across overlapping windows and queries). Existing engines implement ad-hoc aggregation and parallelization strategies. As a result, they only achieve high performance for specific queries depending on the window definition and the type of aggregation function. We describe a general model for the design space of window aggregation strategies. Based on this, we introduce LightSaber, a new stream processing engine that balances parallelism and incremental processing when executing window aggregation queries on multi-core CPUs. Its design generalizes existing approaches: (i) for parallel processing, LightSaber constructs a parallel aggregation tree (PAT) that exploits the parallelism of modern processors. The PAT divides window aggregation into intermediate steps that enable the efficient use of both instruction-level (i.e., SIMD) and task-level (i.e., multi-core) parallelism; and (ii) to generate efficient incremental code from the PAT, LightSaber uses a generalized aggregation graph (GAG), which encodes the low-level data dependencies required to produce aggregates over the stream. A GAG thus generalizes state-of-the-art approaches for incremental window aggregation and supports work-sharing between overlapping windows. LightSaber achieves up to an order of magnitude higher throughput compared to existing systems-on a 16-core server, it processes 470 million records/s with 132 ?s average latency.

Conference paper

Theodorakis G, Pietzuch P, Pirk H, 2020, SlideSide: a fast incremental stream processing algorithm for multiple queries, 23rd International Conference on Extending Database Technology (EDBT), Pages: 435-438, ISSN: 2367-2005

Aggregate window computations lie at the core of online analyt-ics in both academic and industrial applications. To efficientlycompute sliding windows, the state-of-the-art algorithms utilizeincremental processing that avoids the recomputation of windowresults from scratch. In this paper, we propose a novel algorithm,calledSlideSide, that extendsTwoStacksfor multiple concur-rent aggregate queries over the same data stream. Our approachuses different yet similar processing schemes for invertible andnon-invertible functions and exhibits up to 2×better through-put compared to the state-of-the-art incremental techniques in amulti-query environment.

Conference paper

Lind J, Naor O, Eyal I, Kelbert F, Sirer EG, Pietzuch Pet al., 2019, Teechain: A secure payment network with asynchronous blockchain access, 27th ACM Symposium on Operating Systems Principles (SOSP), Publisher: ACM Press, Pages: 63-79

Blockchains such as Bitcoin and Ethereum execute payment transactions securely, but their performance is limited by the need for global consensus. Payment networks overcome this limitation through off-chain transactions. Instead of writing to the blockchain for each transaction, they only settle the final payment balances with the underlying blockchain. When executing off-chain transactions in current payment networks, parties must access the blockchain within bounded time to detect misbehaving parties that deviate from the protocol. This opens a window for attacks in which a malicious party can steal funds by deliberately delaying other parties' blockchain access and prevents parties from using payment networks when disconnected from the blockchain.We present Teechain, the first layer-two payment network that executes off-chain transactions asynchronously with respect to the underlying blockchain. To prevent parties from misbehaving, Teechain uses treasuries, protected by hardware trusted execution environments (TEEs), to establish off-chain payment channels between parties. Treasuries maintain collateral funds and can exchange transactions efficiently and securely, without interacting with the underlying blockchain. To mitigate against treasury failures and to avoid having to trust all TEEs, Teechain replicates the state of treasuries using committee chains, a new variant of chain replication with threshold secret sharing. Teechain achieves at least a 33X higher transaction throughput than the state-of-the-art Lightning payment network. A 30-machine Teechain deployment can handle over 1 million Bitcoin transactions per second.

Conference paper

Clegg R, Landa R, Griffin D, Rio M, Hughes P, Kegel I, Stevens T, Pietzuch P, Williams Det al., 2019, Faces in the clouds: long-duration, multi-user, cloud-assisted video conferencing, IEEE Transactions on Cloud Computing, Vol: 7, Pages: 756-769, ISSN: 2168-7161

Multi-user video conferencing is a ubiquitous technology. Increasingly end-hosts in a conference are assisted by cloud-based servers that improve the quality of experience for end users. This paper evaluates the impact of strategies for placement of such servers on user experience and deployment cost. We consider scenarios based upon the Amazon EC2 infrastructure as well as future scenarios in which cloud instances can be located at a larger number of possible sites across the planet. We compare a number of possible strategies for choosing which cloud locations should host services and how traffic should route through them. Our study is driven by real data to create demand scenarios with realistic geographical user distributions and diurnal behaviour. We conclude that on the EC2 infrastructure a well chosen static selection of servers performs well but as more cloud locations are available a dynamic choice of servers becomes important.

Journal article

Koliousis A, Watcharapichat P, Weidlich M, Mai L, Costa P, Pietzuch Pet al., 2019, Crossbow: scaling deep learning with small batch sizes on multi-GPU servers, Proceedings of the VLDB Endowment, Vol: 12, ISSN: 2150-8097

Deep learning models are trained on servers with many GPUs, andtraining must scale with the number of GPUs. Systems such asTensorFlow and Caffe2 train models with parallel synchronousstochastic gradient descent: they process a batch of training data ata time, partitioned across GPUs, and average the resulting partialgradients to obtain an updated global model. To fully utilise allGPUs, systems must increase the batch size, which hinders statisticalefficiency. Users tune hyper-parameters such as the learning rate tocompensate for this, which is complex and model-specific.We describeCROSSBOW, a new single-server multi-GPU sys-tem for training deep learning models that enables users to freelychoose their preferred batch size—however small—while scalingto multiple GPUs.CROSSBOWuses many parallel model replicasand avoids reduced statistical efficiency through a new synchronoustraining method. We introduceSMA, a synchronous variant of modelaveraging in which replicasindependentlyexplore the solution spacewith gradient descent, but adjust their searchsynchronouslybased onthe trajectory of a globally-consistent average model.CROSSBOWachieves high hardware efficiency with small batch sizes by poten-tially training multiple model replicas per GPU, automatically tuningthe number of replicas to maximise throughput. Our experimentsshow thatCROSSBOWimproves the training time of deep learningmodels on an 8-GPU server by 1.3–4×compared to TensorFlow.

Journal article

Pirk H, Giceva J, Pietzuch P, 2019, Thriving in the No Man's Land between compilers and databases, Conference on Innovative Data Systems Research

When developing new data-intensive applications, one faces a build-or-buy decision: use an existing off-the-shelf data management sys-tem (DMS) or implement a custom solution. While off-the-shelfsystems offer quick results, they lack the flexibility to accommo-date the changing requirements of long-term projects. Building asolution from scratch in a general-purpose programming language,however, comes with long-term development costs that may not bejustified. What is lacking is a middle ground or, more precisely,a clear migration path from off-the-shelf Data Management Sys-tems to customized applications in general-purpose programminglanguages. There is, in effect, a no man’s land that neither compilernor database researchers have claimed.We believe that this problem is an opportunity for the databasecommunity to claim a stake. We need to invest effort to transfer theoutcomes of data management research into fields of programminglanguages and compilers. The common complaint that other fieldsare re-inventing database techniques bears witness to the need forthat knowledge transfer. In this paper, we motivate the necessityfor data management techniques in general-purpose programminglanguages and outline a number of specific opportunities for knowl-edge transfer. This effort will not only cover the no man’s land butalso broaden the impact of data management research.

Conference paper

Theodorakis G, Koliousis A, Pietzuch P, Pirk Het al., 2018, Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation, International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures, ADMS@VLDB 2018, Rio de Janeiro, Brazil, August 27, 2018

The computation of sliding window aggregates is one of the corefunctionalities of stream processing systems. Presently, there aretwo classes of approaches to evaluating them. The first is non-incremental, i.e., every window is evaluated in isolation even ifoverlapping windows provide opportunities for work-sharing. Whilenot algorithmically efficient, this class of algorithm is usually veryCPU efficient. The other approach is incremental: to the amountpossible, the result of one window evaluation is used to help withthe evaluation of the next window. While algorithmically efficient,the inherent control-dependencies in the CPU instruction streammake this highly CPU inefficient.In this paper, we analyse the state of the art in efficient incre-mental window processing and extend the fastest known algorithm,the Two-Stacks approach with known as well as novel optimisa-tions. These include SIMD-parallel processing, internal data struc-ture decomposition and data minimalism. We find that, thus opti-mised, our incremental window aggregation algorithm outperformsthe state-of-the-art incremental algorithm by up to11×. In ad-dition, it is at least competitive and often significantly (up to80%)faster than a non-incremental algorithm. Consequently, stream pro-cessing systems can use our proposed algorithm to cover incremen-tal as well as non-incremental aggregation resulting in systems thatare simpler as well as faster.

Conference paper

O'Keeffe D, Salonidis T, Pietzuch PR, 2018, Frontier: Resilient Edge Processing for the Internet of Things, Very Large Databases (VLDB), Publisher: ACM, Pages: 1178-1191, ISSN: 2150-8097

In an edge deployment model, Internet-of-Things (IoT) applications, e.g. for building automation or video surveillance, must process data locally on IoT devices without relying on permanent connectivity to a cloud backend. The ability to harness the combined resources of multiple IoT devices for computation is influenced by the quality of wireless network connectivity. An open challenge is how practical edge-based IoT applications can be realised that are robust to changes in network bandwidth between IoT devices, due to interference and intermittent connectivity.We present Frontier, a distributed and resilient edge processing platform for IoT devices. The key idea is to express data-intensive IoT applications as continuous data-parallel streaming queries and to improve query throughput in an unreliable wireless network by exploiting network path diversity: a query includes operator replicas at different IoT nodes, which increases possible network paths for data. Frontier dynamically routes stream data to operator replicas based on network path conditions. Nodes probe path throughput and use backpressure stream routing to decide on transmission rates, while exploiting multiple operator replicas for data-parallelism. If a node loses network connectivity, a transient disconnection recovery mechanism reprocesses the lost data. Our experimental evaluation of Frontier shows that network path diversity improves throughput by 1.3×−2.8×for different IoT applications, while being resilient to intermittent network connectivity.

Conference paper

Aublin PRER, Kelbert FM, O'Keeffe D, Muthukumaran D, Priebe C, Lind J, Krahn R, Fetzer C, Eyers D, Pietzuch PRet al., 2018, LibSEAL: revealing service integrity violations using trusted execution, The European Conference on Computer Systems, Publisher: ACM

Conference paper

Garefalakis P, Karanasos K, Pietzuch P, Suresh A, Rao Set al., 2018, Medea: scheduling of long running applications in shared production clusters, EuroSys 2018, Publisher: ACM

The rise in popularity of machine learning, streaming, and latency-sensitiveonline applications in shared production clusters hasraised new challenges for cluster schedulers. To optimize theirperformance and resilience, these applications require precise controlof their placements, by means of complex constraints, e.g., tocollocate or separate their long-running containers across groupsof nodes. In the presence of these applications, the cluster schedulermust attain global optimization objectives, such as maximizingthe number of deployed applications or minimizing the violatedconstraints and the resource fragmentation, but without affectingthe scheduling latency of short-running containers.We present Medea, a new cluster scheduler designed for theplacement of long- and short-running containers. Medea introducespowerful placement constraints with formal semantics to captureinteractions among containers within and across applications. Itfollows a novel two-scheduler design: (i) for long-running containers,it applies an optimization-based approach that accounts forconstraints and global objectives; (ii) for short-running containers,it uses a traditional task-based scheduler for low placement latency.Evaluated on a 400-node cluster, our implementation of Medea onApache Hadoop YARN achieves placement of long-running applicationswith significant performance and resilience benefits comparedto state-of-the-art schedulers.

Conference paper

Castro Fernandez R, Culhane W, Watcharapichat P, Weidlich M, Pietzuch PRet al., 2018, Meta-dataflows: efficient exploratory dataflow jobs, ACM Conference on Management of Data (SIGMOD), Publisher: Association for Computing Machinery (ACM), ISSN: 0730-8078

Distributed dataflow systems such as Apache Spark and ApacheFlink are used to derive new insights from large datasets. While theyefficiently executeconcretedata processing workflows, expressedas dataflow graphs, they lack generic support forexploratory work-flows: if a user is uncertain about the correct processing pipeline,e.g. in terms of data cleaning strategy or choice of model parame-ters, they must repeatedly submit modified jobs to the system. This,however, misses out on optimisation opportunities for exploratoryworkflows, both in terms of scheduling and memory allocation.We describemeta-dataflows(MDFs), a new model to effectivelyexpress exploratory workflows and efficiently execute them oncompute clusters. With MDFs, users specify afamilyof dataflowsusing two primitives: (a) anexploreoperator automatically con-siders choices in a dataflow; and (b) achooseoperator assesses theresult quality of explored dataflow branches and selects a subset ofthe results. We propose optimisations to execute MDFs: a systemcan (i) avoid redundant computation when exploring branches byreusing intermediate results and discarding results from underper-forming branches; and (ii) consider future data access patterns inthe MDF when allocating cluster memory. Our evaluation showsthat MDFs improve the runtime of exploratory workflows by up to90% compared to sequential execution.

Conference paper

Lind J, Priebe C, Muthukumaran D, O'Keeffe D, Aublin P, Kelbert F, Reiher T, Goltzsche D, Eyers D, Kapitza R, Fetzer C, Pietzuch Pet al., 2018, Glamdring: automatic application partitioning for Intel SGX, USENIX Annual Technical Conference 2017, Publisher: USENIX, Pages: 285-298

Trusted execution support in modern CPUs, as offered byIntel SGXenclaves, can protect applications in untrustedenvironments. While prior work has shown that legacyapplications can run in their entirety inside enclaves, thisresults in a large trusted computing base (TCB). Instead,we explore an approach in which wepartitionan applica-tion and use an enclave to protect only security-sensitivedata and functions, thus obtaining a smaller TCB.We describeGlamdring, the first source-level parti-tioning framework that secures applications written inC using Intel SGX. A developer first annotates security-sensitive application data. Glamdring then automaticallypartitions the application into untrusted and enclaveparts: (i) to preserve data confidentiality, Glamdring usesdataflow analysisto identify functions that may be ex-posed to sensitive data; (ii) for data integrity, it usesback-ward slicingto identify functions that may affect sensitivedata. Glamdring then places security-sensitive functionsinside the enclave, and adds runtime checks and crypto-graphic operations at the enclave boundary to protect itfrom attack. Our evaluation of Glamdring with the Mem-cached store, the LibreSSL library, and the Digital Bitboxbitcoin wallet shows that it achieves small TCB sizes andhas acceptable performance overheads.

Conference paper

Rupprecht L, Culhane WJ, Pietzuch P, 2017, SquirrelJoin: network-aware distributed join processing with lazy partitioning, Proceedings of the VLDB Endowment, Vol: 10, Pages: 1250-1261, ISSN: 2150-8097

To execute distributed joins in parallel on compute clusters, systemspartition and exchange data records between workers. With largedatasets, workers spend a considerable amount of time transferringdata over the network. When compute clusters are shared amongmultiple applications, workers must compete for network bandwidthwith other applications. These variances in the available networkbandwidth lead tonetwork skew, which causes straggling workersto prolong the join completion time.We describeSquirrelJoin, a distributed join processing techniquethat useslazy partitioningto adapt to transient network skew inclusters. Workers maintain in-memorylazy partitionsto withhold asubset of records, i.e. not sending them immediately to other work-ers for processing. Lazy partitions are then assigned dynamicallyto other workers based on network conditions: each worker takesperiodic throughput measurements to estimate its completion time,and lazy partitions are allocated as to minimise the join completiontime. We implement SquirrelJoin as part of the Apache Flink dis-tributed dataflow framework and show that, under transient networkcontention in a shared compute cluster, SquirrelJoin speeds up joincompletion times by up to 2.9× with only a small, fixed overhead.

Journal article

Goltzsche D, Wulf C, Muthukumaran D, Rieck K, Pietzuch PR, Kapitza Ret al., 2017, TrustJS: Trusted Client-side Execution of JavaScript, EuroSec'17, Publisher: ACM

Client-side JavaScript has become ubiquitous in web applications to improve user experience and reduce server load. However, since clients are untrusted, servers cannot rely on the confidentiality or integrity of client-side JavaScript code and the data that it operates on. For example, client-side input validation must be repeated at server side, and confidential business logic cannot be offloaded. In this paper, we present TrustJS, a framework that enables trustworthy execution of security-sensitive JavaScript inside commodity browsers. TrustJS leverages trusted hardware support provided by Intel SGX to protect the client-side execution of JavaScript, enabling a flexible partitioning of web application code. We present the design of TrustJS and provide initial evaluation results, showing that trustworthy JavaScript offloading can further improve user experience and conserve more server resources.

Conference paper

Lind J, Eyal I, Pietzuch PR, Sirer EGet al., 2017, Teechan: payment channels using trusted execution environments, 4th Workshop on Bitcoin and Blockchain Research, Publisher: Springer, ISSN: 0302-9743

Blockchain protocols are inherently limited in transaction throughputand latency. Recent efforts to address performance and scale blockchainshave focused on off-chain payment channels. While such channels can achievelow latency and high throughput, deploying them securely on top of the Bitcoinblockchain has been difficult, partly because building a secure implementationrequires changes to the underlying protocol and the ecosystem.We present Teechan, a full-duplex payment channel framework that exploitstrusted execution environments. Teechan can be deployed securely on the existingBitcoin blockchain without having to modify the protocol. It: (i) achieves a highertransaction throughput and lower transaction latency than prior solutions; (ii) enablesunlimited full-duplex payments as long as the balance does not exceed thechannel’s credit; (iii) requires only a single message to be sent per payment inany direction; and (iv) places at most two transactions on the blockchain underany execution scenario.We have built and deployed the Teechan framework using Intel SGX on theBitcoin network. Our experiments show that, not counting network latencies,Teechan can achieve 2,480 transactions per second on a single channel, with submillisecondlatencies.

Conference paper

Rupprecht L, Zhang R, Owen W, Pietzuch PR, Hildebrand Det al., 2017, SwiftAnalytics: optimizing object stores for big data analytics, IEEE International Conference on Cloud Engineering (IC2E), 2017, Publisher: IEEE

Due to their scalability and low cost, object-basedstorage systems are an attractive storage solution and widelydeployed. To gain valuable insight from the data residing inobject storage but avoid expensive copying to a distributedfilesystem (e.g. HDFS), it would be natural to directly use themas a storage backend for data-parallel analytics frameworks suchas Spark or MapReduce. Unfortunately, executing data-parallelframeworks on object storage exhibits severe performance prob-lems, reducing average job completion times by up to 6.5×.We identify the two most severe performance problems whenrunning data-parallel frameworks on the OpenStack Swift objectstorage system in comparison to the HDFS distributed filesystem:(i) the fixed mapping of object names to storage nodes preventslocal writes and adds delay when objects are renamed; (ii) thecoarser granularity of objects compared to blocks reduces datalocality during reads. We propose the SwiftAnalytics objectstorage system to address them: (i) it useslocality-awarewrites tocontrol an object’s location and eliminate unnecessary I/O relatedto renames during job completion, speeding up analytics jobs byup to 5.1×; (ii) it transparently chunks objects into smaller sizedparts to improve data-locality, leading to up to 3.4×faster reads.

Conference paper

Pietzuch PR, Arnautov S, Trach B, Gregor F, Knauth T, Martin A, Priebe C, Lind J, Muthukumaran D, O'Keeffe D, Stillwell M, Goltzsche D, Eyers D, Rüdiger K, Fetzer Cet al., 2016, SCONE: Secure Linux Containers with Intel SGX, 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, Publisher: USENIX

In multi-tenant environments, Linux containers managedby Docker or Kubernetes have a lower resource footprint,faster startup times, and higher I/O performance com-pared to virtual machines (VMs) on hypervisors. Yettheir weaker isolation guarantees, enforced through soft-ware kernel mechanisms, make it easier for attackers tocompromise the confidentiality and integrity of applica-tion data within containers.We describe SCONE, a secure container mechanismfor Docker that uses the SGX trusted execution supportof Intel CPUs to protect container processes from out-side attacks. The design of SCONE leads to (i) a smalltrusted computing base (TCB) and (ii) a low performanceoverhead: SCONE offers a secure C standard library in-terface that transparently encrypts/decrypts I/O data; toreduce the performance impact of thread synchronizationand system calls within SGX enclaves, SCONE supportsuser-level threading and asynchronous system calls. Ourevaluation shows that it protects unmodified applicationswith SGX, achieving 0.6✓–1.2✓of native throughput

Conference paper

Papagiannis I, Watcharapichat P, Muthukumaran D, Pietzuch PRet al., 2016, BrowserFlow: imprecise data flow tracking to prevent accidental data disclosure, Middleware '16: 17th International Middleware Conference, Publisher: ACM, Pages: 1-13

With the use of external cloud services such as Google Docs or Evernote in an enterprise setting, the loss of control over sensitive data becomes a major concern for organisations. It is typical for regular users to violate data disclosure policies accidentally, e.g. when sharing text between documents in browser tabs. Our goal is to help such users comply with data disclosure policies: we want to alert them about potentially unauthorised data disclosure from trusted to untrusted cloud services. This is particularly challenging when users can modify data in arbitrary ways, they employ multiple cloud services, and cloud services cannot be changed.To track the propagation of text data robustly across cloud services, we introduce imprecise data flow tracking, which identifies data flows implicitly by detecting and quantifying the similarity between text fragments. To reason about violations of data disclosure policies, we describe a new text disclosure model that, based on similarity, associates text fragments in web browsers with security tags and identifies unauthorised data flows to untrusted services. We demonstrate the applicability of imprecise data tracking through BrowserFlow, a browser-based middleware that alerts users when they expose potentially sensitive text to an untrusted cloud service. Our experiments show that BrowserFlow can robustly track data flows and manage security tags for documents with no noticeable performance impact.

Conference paper

Brenner S, Wulf C, Lorenz M, Weichbrodt N, Goltzsche G, Fetzer C, Pietzuch PR, Kapitza Ret al., 2016, SecureKeeper: confidential zooKeeper using intel SGX, ACM/IFIP/USENIX International Conference on Middleware (Middleware), 2016, Publisher: ACM, Pages: 1-13

Cloud computing, while ubiquitous, still suffers from trust issues, especially for applications managing sensitive data. Third-party coordination services such as ZooKeeper and Consul are fundamental building blocks for cloud applications, but are exposed to potentially sensitive application data. Recently, hardware trust mechanisms such as Intel's Software Guard Extensions (SGX) offer trusted execution environments to shield application data from untrusted software, including the privileged Operating System (OS) and hypervisors. Such hardware support suggests new options for securing third-party coordination services.We describe SecureKeeper, an enhanced version of the ZooKeeper coordination service that uses SGX to preserve the confidentiality and basic integrity of ZooKeeper-managed data. SecureKeeper uses multiple small enclaves to ensure that (i) user-provided data in ZooKeeper is always kept encrypted while not residing inside an enclave, and (ii) essential processing steps that demand plaintext access can still be performed securely. SecureKeeper limits the required changes to the ZooKeeper code base and relies on Java's native code support for accessing enclaves. With an overhead of 11%, the performance of SecureKeeper with SGX is comparable to ZooKeeper with secure communication, while providing much stronger security guarantees with a minimal trusted code base of a few thousand lines of code.

Conference paper

Pietzuch P, Watcharapichat P, Lopez Morales V, Castro Fernandez Ret al., 2016, Ako: Decentralised Deep Learning with Partial Gradient Exchange, ACM Symposium on Cloud Computing 2016 (SoCC), Publisher: ACM, Pages: 84-97

Distributed systems for the training of deep neural networks (DNNs) with large amounts of data have vastly improved the accuracy of machine learning models for image and speech recognition. DNN systems scale to large cluster deployments by having worker nodes train many model replicas in parallel; to ensure model convergence, parameter servers periodically synchronise the replicas. This raises the challenge of how to split resources between workers and parameter servers so that the cluster CPU and network resources are fully utilised without introducing bottlenecks. In practice, this requires manual tuning for each model configuration or hardware type.We describe Ako, a decentralised dataflow-based DNN system without parameter servers that is designed to saturate cluster resources. All nodes execute workers that fully use the CPU resources to update model replicas. To synchronise replicas as often as possible subject to the available network bandwidth, workers exchange partitioned gradient updates directly with each other. The number of partitions is chosen so that the used network bandwidth remains constant, independently of cluster size. Since workers eventually receive all gradient partitions after several rounds, convergence is unaffected. For the ImageNet benchmark on a 64-node cluster, Ako does not require any resource allocation decisions, yet converges faster than deployments with parameter servers.

Conference paper

Pietzuch PR, Weichbrodt N, Kurmus A, Kurmus Ret al., 2016, AsyncShock: Exploiting Synchronisation Bugs in Intel SGX Enclaves, 21st European Symposium on Research in Computer Security (ESORICS), Publisher: Springer International Publishing, Pages: 440-457, ISSN: 0302-9743

Intel’s Software Guard Extensions (SGX) provide a new hardware-based trusted execution environment on Intel CPUs using secure enclaves that are resilient to accesses by privileged code and physical attackers. Originally designed for securing small services, SGX bears promise to protect complex, possibly cloud-hosted, legacy applications. In this paper, we show that previously considered harmless synchronisation bugs can turn into severe security vulnerabilities when using SGX. By exploiting use-after-free and time-of-check-to-time-of-use (TOCTTOU) bugs in enclave code, an attacker can hijack its control flow or bypass access control.We present AsyncShock, a tool for exploiting synchronisation bugs of multithreaded code running under SGX. AsyncShock achieves this by only manipulating the scheduling of threads that are used to execute enclave code. It allows an attacker to interrupt threads by forcing segmentation faults on enclave pages. Our evaluation using two types of Intel Skylake CPUs shows that AsyncShock can reliably exploit use-after-free and TOCTTOU bugs.

Conference paper

Pietzuch PR, Koliousis A, Weidlich M, Costa P, Wolf A, Castro Fernandez Ret al., 2016, Demo- The SABER system for window-based hybrid stream processing with GPGPUs, DEBS 2016, Publisher: Association for Computing Machinery, Pages: 354-357

Heterogeneous architectures that combine multi-core CPUs withmany-core GPGPUs have the potential to improve the performanceof data-intensive stream processing applications. Yet, a stream pro-cessing engine must execute streaming SQL queries with sufficientdata-parallelism to fully utilise the available heterogeneous proces-sors, and decide how to use each processor in the most effectiveway. Addressing these challenges, we demonstrate SABER, ahybrid high-performance relational stream processing engine forCPUs and GPGPUs. SABER executes window-based streaming SQL queries in a data-parallel fashion and employs an adaptive scheduling strategy to balance the load on the different types of processors. To hidedata movement costs, SABER pipelines the transfer of stream databetween CPU and GPGPU memory. In this paper, we review thedesign principles of SABER in terms of its hybrid stream processingmodel and its architecture for query execution. We also present aweb front-end that monitors processing throughput.

Conference paper

Kalyvianaki E, Fiscato M, Salonidis T, Pietzuch PRet al., 2016, THEMIS: Fairness in Federated Stream Processing under Overload, 2016 International Conference on Management of Data (SIGMOD '16), Publisher: ACM, Pages: 541-553

Federated stream processing systems, which utilise nodes from multiple independent domains, can be found increasingly in multi-provider cloud deployments, internet-of-things systems, collaborative sensing applications and large-scale grid systems. To pool resources from several sites and take advantage of local processing, submitted queries are split into query fragments, which are executed collaboratively by different sites. When supporting many concurrent users, however, queries may exhaust available processing resources, thus requiring constant load shedding. Given that individual sites have autonomy over how they allocate query fragments on their nodes, it is an open challenge how to ensure global fairness on processing quality experienced by queries in a federated scenario.We describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments. It executes queries in a globally fair fashion and provides users with constant feedback on the experienced processing quality for their queries. THEMIS associates stream data with its source information content (SIC), a metric that quantifies the contribution of that data towards the query result, based on the amount of source data used to generate it. We provide the BALANCE-SIC distributed load shedding algorithm that balances the SIC values of result data. Our evaluation shows that the BALANCE-SIC algorithm yields balanced SIC values across queries, as measured by Jain's Fairness Index. Our approach also incurs a low execution time overhead.

Conference paper

Koliousis A, Weidlich M, Fernandez R, Wolf A, Costa P, Pietzuch Pet al., 2016, Saber: Window-based Hybrid Stream Processing for Heterogeneous Architectures, 2016 ACM SIGMOD/PODS Conference, Publisher: ACM

Modern servers have become heterogeneous, often combining multicoreCPUs with many-core GPGPUs. Such heterogeneous architectureshave the potential to improve the performance of data-intensivestream processing applications, but they are not supported by currentrelational stream processing engines. For an engine to exploit aheterogeneous architecture, it must execute streaming SQL querieswith sufficient data-parallelism to fully utilise all available heterogeneousprocessors, and decide how to use each in the most effectiveway. It must do this while respecting the semantics of streamingSQL queries, in particular with regard to window handling.We describe SABER, a hybrid high-performance relational streamprocessing engine for CPUs and GPGPUs. SABER executes windowbasedstreaming SQL queries in a data-parallel fashion using allavailable CPU and GPGPU cores. Instead of statically assigningquery operators to heterogeneous processors, SABER employs anew adaptive heterogeneous lookahead scheduling strategy, whichincreases the share of queries executing on the processor that yieldsthe highest performance. To hide data movement costs, SABERpipelines the transfer of stream data between different memory typesand the CPU/GPGPU. Our experimental comparison against state-ofthe-artengines shows that SABER increases processing throughputwhile maintaining low latency for a wide range of streaming SQLqueries with small and large windows sizes.

Conference paper

Alim A, Clegg RG, Mai L, Rupprecht L, Seckler E, Costa P, Pietzuch P, Wolf AL, Sultana N, Crowcroft J, Madhavapeddy A, Moore A, Mortier R, Koleni M, Oviedo L, Migliavacca M, McAuley Det al., 2016, FLICK: developing and running application-specific network services, 2016 USENIX Annual Technical Conference (USENIX ATC '16), Publisher: USENIX Association

Data centre networks are increasingly programmable,with application-specific network services proliferating,from custom load-balancers to middleboxes providingcaching and aggregation. Developers must currently implementthese services using traditional low-level APIs,which neither support natural operations on applicationdata nor provide efficient performance isolation.We describe FLICK, a framework for the programmingand execution of application-specific network serviceson multi-core CPUs. Developers write network servicesin the FLICK language, which offers high-level processingconstructs and application-relevant data types.FLICK programs are translated automatically to efficient,parallel task graphs, implemented in C++ on top of auser-space TCP stack. Task graphs have bounded resourceusage at runtime, which means that the graphsof multiple services can execute concurrently withoutinterference using cooperative scheduling. We evaluateFLICK with several services (an HTTP load-balancer,a Memcached router and a Hadoop data aggregator),showing that it achieves good performance while reducingdevelopment effort.

Conference paper

Cherkasova L, Pietzuch P, Wang CL, 2016, Message from the IC2E 2016 program commitee co-chairs, IC2E, Pages: x-xi

Conference paper

Ogden, Thomas D, Pietzuch P, 2016, AT-GIS: highly parallel spatial query processing with associative transducers, ACM SIGMOD International Conference on Management of Data 2016, Publisher: ACM, Pages: 1041-1054

Users in many domains, including urban planning, transportation,and environmental science want to execute analytical queries overcontinuously updated spatial datasets. Current solutions for largescalespatial query processing either rely on extensions to RDBMS,which entails expensive loading and indexing phases when thedata changes, or distributed map/reduce frameworks, running onresource-hungry compute clusters. Both solutions struggle with thesequential bottleneck of parsing complex, hierarchical spatial dataformats, which frequently dominates query execution time. Ourgoal is to fully exploit the parallelism offered by modern multicoreCPUs for parsing and query execution, thus providing theperformance of a cluster with the resources of a single machine.We describe AT-GIS, a highly-parallel spatial query processingsystem that scales linearly to a large number of CPU cores. ATGISintegrates the parsing and querying of spatial data using a newcomputational abstraction called associative transducers(ATs). ATscan form a single data-parallel pipeline for computation withoutrequiring the spatial input data to be split into logically independentblocks. Using ATs, AT-GIS can execute, in parallel, spatial queryoperators on the raw input data in multiple formats, without anypre-processing. On a single 64-core machine, AT-GIS provides 3×the performance of an 8-node Hadoop cluster with 192 cores forcontainment queries, and 10× for aggregation queries.

Conference paper

Fernandez RC, Garefalakis P, Pietzuch P, 2016, Java2SDG: Stateful big data processing for the masses, 32nd IEEE International Conference on Data Engineering (ICDE), Publisher: IEEE, Pages: 1390-1393

Big data processing is no longer restricted to specially-trained engineers. Instead, domain experts, data scientists and data users all want to benefit from applying data mining and machine learning algorithms at scale. A considerable obstacle towards this “democratisation of big data” are programming models: current scalable big data processing platforms such as Spark, Naiad and Flink require users to learn custom functional or declarative programming models, which differ fundamentally from popular languages such as Java, Matlab, Python or C++. An open challenge is how to provide a big data programming model for users that are not familiar with functional programming, while maintaining performance, scalability and fault tolerance. We describe JAVA2SDG, a compiler that translates annotated Java programs to stateful dataflow graphs (SDGs) that can execute on a compute cluster in a data-parallel and fault-tolerant fashion. Compared to existing distributed dataflow models, a distinguishing feature of SDGs is that their computational tasks can access distributed mutable state, thus allowing SDGs to capture the semantics of stateful Java programs. As part of the demonstration, we provide examples of machine learning programs in Java, including collaborative filtering and logistic regression, and we explain how they are translated to SDGs and executed on a large set of machines.

Conference paper

Cadar C, Keeton K, Pietzuch P, Rodrigues Ret al., 2016, Proceedings of the 11th European Conference on Computer Systems, EuroSys 2016: Preface, European Conference on Computer Systems (EuroSys 2016)

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00499513&limit=30&person=true