130 results found
Koliousis A, Watcharapichat P, Weidlich M, et al., Crossbow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers, Very Large Databases (VLDB) 2019, Publisher: VLDB Endowment, ISSN: 2150-8097
Deep learning models are trained on servers with many GPUs, andtraining must scale with the number of GPUs. Systems such asTensorFlow and Caffe2 train models with parallel synchronousstochastic gradient descent: they process a batch of training data ata time, partitioned across GPUs, and average the resulting partialgradients to obtain an updated global model. To fully utilise allGPUs, systems must increase the batch size, which hinders statisticalefficiency. Users tune hyper-parameters such as the learning rate tocompensate for this, which is complex and model-specific.We describeCROSSBOW, a new single-server multi-GPU sys-tem for training deep learning models that enables users to freelychoose their preferred batch size—however small—while scalingto multiple GPUs.CROSSBOWuses many parallel model replicasand avoids reduced statistical efficiency through a new synchronoustraining method. We introduceSMA, a synchronous variant of modelaveraging in which replicasindependentlyexplore the solution spacewith gradient descent, but adjust their searchsynchronouslybased onthe trajectory of a globally-consistent average model.CROSSBOWachieves high hardware efficiency with small batch sizes by poten-tially training multiple model replicas per GPU, automatically tuningthe number of replicas to maximise throughput. Our experimentsshow thatCROSSBOWimproves the training time of deep learningmodels on an 8-GPU server by 1.3–4×compared to TensorFlow.
Pirk H, Giceva J, Pietzuch P, 2019, Thriving in the No Man's Land between compilers and databases, Conference on Innovative Data Systems Research
When developing new data-intensive applications, one faces a build-or-buy decision: use an existing off-the-shelf data management sys-tem (DMS) or implement a custom solution. While off-the-shelfsystems offer quick results, they lack the flexibility to accommo-date the changing requirements of long-term projects. Building asolution from scratch in a general-purpose programming language,however, comes with long-term development costs that may not bejustified. What is lacking is a middle ground or, more precisely,a clear migration path from off-the-shelf Data Management Sys-tems to customized applications in general-purpose programminglanguages. There is, in effect, a no man’s land that neither compilernor database researchers have claimed.We believe that this problem is an opportunity for the databasecommunity to claim a stake. We need to invest effort to transfer theoutcomes of data management research into fields of programminglanguages and compilers. The common complaint that other fieldsare re-inventing database techniques bears witness to the need forthat knowledge transfer. In this paper, we motivate the necessityfor data management techniques in general-purpose programminglanguages and outline a number of specific opportunities for knowl-edge transfer. This effort will not only cover the no man’s land butalso broaden the impact of data management research.
Theodorakis G, Koliousis A, Pietzuch P, et al., 2018, Hammer Slide: Work- and CPU-efficient Streaming Window Aggregation, 9th International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures (ADMS'18)
The computation of sliding window aggregates is one of the corefunctionalities of stream processing systems. Presently, there aretwo classes of approaches to evaluating them. The first is non-incremental, i.e., every window is evaluated in isolation even ifoverlapping windows provide opportunities for work-sharing. Whilenot algorithmically efficient, this class of algorithm is usually veryCPU efficient. The other approach is incremental: to the amountpossible, the result of one window evaluation is used to help withthe evaluation of the next window. While algorithmically efficient,the inherent control-dependencies in the CPU instruction streammake this highly CPU inefficient.In this paper, we analyse the state of the art in efficient incre-mental window processing and extend the fastest known algorithm,the Two-Stacks approach with known as well as novel optimisa-tions. These include SIMD-parallel processing, internal data struc-ture decomposition and data minimalism. We find that, thus opti-mised, our incremental window aggregation algorithm outperformsthe state-of-the-art incremental algorithm by up to11×. In ad-dition, it is at least competitive and often significantly (up to80%)faster than a non-incremental algorithm. Consequently, stream pro-cessing systems can use our proposed algorithm to cover incremen-tal as well as non-incremental aggregation resulting in systems thatare simpler as well as faster.
O'Keeffe D, Salonidis T, Pietzuch PR, 2018, Frontier: Resilient Edge Processing for the Internet of Things, Very Large Databases (VLDB), Publisher: ACM, Pages: 1178-1191, ISSN: 2150-8097
In an edge deployment model, Internet-of-Things (IoT) applications, e.g. for building automation or video surveillance, must process data locally on IoT devices without relying on permanent connectivity to a cloud backend. The ability to harness the combined resources of multiple IoT devices for computation is influenced by the quality of wireless network connectivity. An open challenge is how practical edge-based IoT applications can be realised that are robust to changes in network bandwidth between IoT devices, due to interference and intermittent connectivity.We present Frontier, a distributed and resilient edge processing platform for IoT devices. The key idea is to express data-intensive IoT applications as continuous data-parallel streaming queries and to improve query throughput in an unreliable wireless network by exploiting network path diversity: a query includes operator replicas at different IoT nodes, which increases possible network paths for data. Frontier dynamically routes stream data to operator replicas based on network path conditions. Nodes probe path throughput and use backpressure stream routing to decide on transmission rates, while exploiting multiple operator replicas for data-parallelism. If a node loses network connectivity, a transient disconnection recovery mechanism reprocesses the lost data. Our experimental evaluation of Frontier shows that network path diversity improves throughput by 1.3×−2.8×for different IoT applications, while being resilient to intermittent network connectivity.
Garefalakis P, Karanasos K, Pietzuch P, et al., 2018, Medea: scheduling of long running applications in shared production clusters, EuroSys 2018, Publisher: ACM
The rise in popularity of machine learning, streaming, and latency-sensitiveonline applications in shared production clusters hasraised new challenges for cluster schedulers. To optimize theirperformance and resilience, these applications require precise controlof their placements, by means of complex constraints, e.g., tocollocate or separate their long-running containers across groupsof nodes. In the presence of these applications, the cluster schedulermust attain global optimization objectives, such as maximizingthe number of deployed applications or minimizing the violatedconstraints and the resource fragmentation, but without affectingthe scheduling latency of short-running containers.We present Medea, a new cluster scheduler designed for theplacement of long- and short-running containers. Medea introducespowerful placement constraints with formal semantics to captureinteractions among containers within and across applications. Itfollows a novel two-scheduler design: (i) for long-running containers,it applies an optimization-based approach that accounts forconstraints and global objectives; (ii) for short-running containers,it uses a traditional task-based scheduler for low placement latency.Evaluated on a 400-node cluster, our implementation of Medea onApache Hadoop YARN achieves placement of long-running applicationswith significant performance and resilience benefits comparedto state-of-the-art schedulers.
Castro Fernandez R, Culhane W, Watcharapichat P, et al., 2018, Meta-dataflows: efficient exploratory dataflow jobs, ACM Conference on Management of Data (SIGMOD), Publisher: Association for Computing Machinery (ACM), ISSN: 0730-8078
Distributed dataflow systems such as Apache Spark and ApacheFlink are used to derive new insights from large datasets. While theyefficiently executeconcretedata processing workflows, expressedas dataflow graphs, they lack generic support forexploratory work-flows: if a user is uncertain about the correct processing pipeline,e.g. in terms of data cleaning strategy or choice of model parame-ters, they must repeatedly submit modified jobs to the system. This,however, misses out on optimisation opportunities for exploratoryworkflows, both in terms of scheduling and memory allocation.We describemeta-dataflows(MDFs), a new model to effectivelyexpress exploratory workflows and efficiently execute them oncompute clusters. With MDFs, users specify afamilyof dataflowsusing two primitives: (a) anexploreoperator automatically con-siders choices in a dataflow; and (b) achooseoperator assesses theresult quality of explored dataflow branches and selects a subset ofthe results. We propose optimisations to execute MDFs: a systemcan (i) avoid redundant computation when exploring branches byreusing intermediate results and discarding results from underper-forming branches; and (ii) consider future data access patterns inthe MDF when allocating cluster memory. Our evaluation showsthat MDFs improve the runtime of exploratory workflows by up to90% compared to sequential execution.
Lind J, Priebe C, Muthukumaran D, et al., 2018, Glamdring: automatic application partitioning for Intel SGX, USENIX Annual Technical Conference 2017, Publisher: USENIX, Pages: 285-298
Trusted execution support in modern CPUs, as offered byIntel SGXenclaves, can protect applications in untrustedenvironments. While prior work has shown that legacyapplications can run in their entirety inside enclaves, thisresults in a large trusted computing base (TCB). Instead,we explore an approach in which wepartitionan applica-tion and use an enclave to protect only security-sensitivedata and functions, thus obtaining a smaller TCB.We describeGlamdring, the first source-level parti-tioning framework that secures applications written inC using Intel SGX. A developer first annotates security-sensitive application data. Glamdring then automaticallypartitions the application into untrusted and enclaveparts: (i) to preserve data confidentiality, Glamdring usesdataflow analysisto identify functions that may be ex-posed to sensitive data; (ii) for data integrity, it usesback-ward slicingto identify functions that may affect sensitivedata. Glamdring then places security-sensitive functionsinside the enclave, and adds runtime checks and crypto-graphic operations at the enclave boundary to protect itfrom attack. Our evaluation of Glamdring with the Mem-cached store, the LibreSSL library, and the Digital Bitboxbitcoin wallet shows that it achieves small TCB sizes andhas acceptable performance overheads.
Aublin PRER, Kelbert FM, O'Keeffe D, et al., LibSEAL: revealing service integrity violations using trusted execution, The European Conference on Computer Systems, Publisher: ACM
Rupprecht L, Culhane WJ, Pietzuch P, 2017, SquirrelJoin: network-aware distributed join processing with lazy partitioning, Proceedings of the VLDB Endowment, Vol: 10, Pages: 1250-1261, ISSN: 2150-8097
To execute distributed joins in parallel on compute clusters, systemspartition and exchange data records between workers. With largedatasets, workers spend a considerable amount of time transferringdata over the network. When compute clusters are shared amongmultiple applications, workers must compete for network bandwidthwith other applications. These variances in the available networkbandwidth lead tonetwork skew, which causes straggling workersto prolong the join completion time.We describeSquirrelJoin, a distributed join processing techniquethat useslazy partitioningto adapt to transient network skew inclusters. Workers maintain in-memorylazy partitionsto withhold asubset of records, i.e. not sending them immediately to other work-ers for processing. Lazy partitions are then assigned dynamicallyto other workers based on network conditions: each worker takesperiodic throughput measurements to estimate its completion time,and lazy partitions are allocated as to minimise the join completiontime. We implement SquirrelJoin as part of the Apache Flink dis-tributed dataflow framework and show that, under transient networkcontention in a shared compute cluster, SquirrelJoin speeds up joincompletion times by up to 2.9× with only a small, fixed overhead.
Lind J, Eyal I, Pietzuch PR, et al., 2017, Teechan: payment channels using trusted execution environments, 4th Workshop on Bitcoin and Blockchain Research, Publisher: Springer, ISSN: 0302-9743
Blockchain protocols are inherently limited in transaction throughputand latency. Recent efforts to address performance and scale blockchainshave focused on off-chain payment channels. While such channels can achievelow latency and high throughput, deploying them securely on top of the Bitcoinblockchain has been difficult, partly because building a secure implementationrequires changes to the underlying protocol and the ecosystem.We present Teechan, a full-duplex payment channel framework that exploitstrusted execution environments. Teechan can be deployed securely on the existingBitcoin blockchain without having to modify the protocol. It: (i) achieves a highertransaction throughput and lower transaction latency than prior solutions; (ii) enablesunlimited full-duplex payments as long as the balance does not exceed thechannel’s credit; (iii) requires only a single message to be sent per payment inany direction; and (iv) places at most two transactions on the blockchain underany execution scenario.We have built and deployed the Teechan framework using Intel SGX on theBitcoin network. Our experiments show that, not counting network latencies,Teechan can achieve 2,480 transactions per second on a single channel, with submillisecondlatencies.
Clegg R, Landa R, Griffin D, et al., 2017, Faces in the clouds: long-duration, multi-user, cloud-assisted video conferencing, IEEE Transactions on Cloud Computing, Pages: 1-1, ISSN: 2168-7161
Multi-user video conferencing is a ubiquitous technology. Increasingly end-hosts in a conference are assisted by cloud-based servers that improve the quality of experience for end users. This paper evaluates the impact of strategies for placement of such servers on user experience and deployment cost. We consider scenarios based upon the Amazon EC2 infrastructure as well as future scenarios in which cloud instances can be located at a larger number of possible sites across the planet. We compare a number of possible strategies for choosing which cloud locations should host services and how traffic should route through them. Our study is driven by real data to create demand scenarios with realistic geographical user distributions and diurnal behaviour. We conclude that on the EC2 infrastructure a well chosen static selection of servers performs well but as more cloud locations are available a dynamic choice of servers becomes important.
Rupprecht L, Zhang R, Owen W, et al., 2017, SwiftAnalytics: optimizing object stores for big data analytics, IEEE International Conference on Cloud Engineering (IC2E), 2017, Publisher: IEEE
Due to their scalability and low cost, object-basedstorage systems are an attractive storage solution and widelydeployed. To gain valuable insight from the data residing inobject storage but avoid expensive copying to a distributedfilesystem (e.g. HDFS), it would be natural to directly use themas a storage backend for data-parallel analytics frameworks suchas Spark or MapReduce. Unfortunately, executing data-parallelframeworks on object storage exhibits severe performance prob-lems, reducing average job completion times by up to 6.5×.We identify the two most severe performance problems whenrunning data-parallel frameworks on the OpenStack Swift objectstorage system in comparison to the HDFS distributed filesystem:(i) the fixed mapping of object names to storage nodes preventslocal writes and adds delay when objects are renamed; (ii) thecoarser granularity of objects compared to blocks reduces datalocality during reads. We propose the SwiftAnalytics objectstorage system to address them: (i) it useslocality-awarewrites tocontrol an object’s location and eliminate unnecessary I/O relatedto renames during job completion, speeding up analytics jobs byup to 5.1×; (ii) it transparently chunks objects into smaller sizedparts to improve data-locality, leading to up to 3.4×faster reads.
Pietzuch PR, Arnautov S, Trach B, et al., 2016, SCONE: Secure Linux Containers with Intel SGX, 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2016, Publisher: USENIX
In multi-tenant environments, Linux containers managedby Docker or Kubernetes have a lower resource footprint,faster startup times, and higher I/O performance com-pared to virtual machines (VMs) on hypervisors. Yettheir weaker isolation guarantees, enforced through soft-ware kernel mechanisms, make it easier for attackers tocompromise the confidentiality and integrity of applica-tion data within containers.We describe SCONE, a secure container mechanismfor Docker that uses the SGX trusted execution supportof Intel CPUs to protect container processes from out-side attacks. The design of SCONE leads to (i) a smalltrusted computing base (TCB) and (ii) a low performanceoverhead: SCONE offers a secure C standard library in-terface that transparently encrypts/decrypts I/O data; toreduce the performance impact of thread synchronizationand system calls within SGX enclaves, SCONE supportsuser-level threading and asynchronous system calls. Ourevaluation shows that it protects unmodified applicationswith SGX, achieving 0.6✓–1.2✓of native throughput
Pietzuch P, Watcharapichat P, Lopez Morales V, et al., 2016, Ako: Decentralised Deep Learning with Partial Gradient Exchange, ACM Symposium on Cloud Computing 2016 (SoCC), Publisher: ACM, Pages: 84-97
Distributed systems for the training of deep neural networks (DNNs) with large amounts of data have vastly improved the accuracy of machine learning models for image and speech recognition. DNN systems scale to large cluster deployments by having worker nodes train many model replicas in parallel; to ensure model convergence, parameter servers periodically synchronise the replicas. This raises the challenge of how to split resources between workers and parameter servers so that the cluster CPU and network resources are fully utilised without introducing bottlenecks. In practice, this requires manual tuning for each model configuration or hardware type.We describe Ako, a decentralised dataflow-based DNN system without parameter servers that is designed to saturate cluster resources. All nodes execute workers that fully use the CPU resources to update model replicas. To synchronise replicas as often as possible subject to the available network bandwidth, workers exchange partitioned gradient updates directly with each other. The number of partitions is chosen so that the used network bandwidth remains constant, independently of cluster size. Since workers eventually receive all gradient partitions after several rounds, convergence is unaffected. For the ImageNet benchmark on a 64-node cluster, Ako does not require any resource allocation decisions, yet converges faster than deployments with parameter servers.
Pietzuch PR, Weichbrodt N, Kurmus A, et al., 2016, AsyncShock: Exploiting Synchronisation Bugs in Intel SGX Enclaves, 21st European Symposium on Research in Computer Security (ESORICS), Publisher: Springer International Publishing, Pages: 440-457, ISSN: 0302-9743
Intel’s Software Guard Extensions (SGX) provide a new hardware-based trusted execution environment on Intel CPUs using secure enclaves that are resilient to accesses by privileged code and physical attackers. Originally designed for securing small services, SGX bears promise to protect complex, possibly cloud-hosted, legacy applications. In this paper, we show that previously considered harmless synchronisation bugs can turn into severe security vulnerabilities when using SGX. By exploiting use-after-free and time-of-check-to-time-of-use (TOCTTOU) bugs in enclave code, an attacker can hijack its control flow or bypass access control.We present AsyncShock, a tool for exploiting synchronisation bugs of multithreaded code running under SGX. AsyncShock achieves this by only manipulating the scheduling of threads that are used to execute enclave code. It allows an attacker to interrupt threads by forcing segmentation faults on enclave pages. Our evaluation using two types of Intel Skylake CPUs shows that AsyncShock can reliably exploit use-after-free and TOCTTOU bugs.
Pietzuch PR, Papagiannis I, Watcharapichat P, et al., 2016, BrowserFlow: imprecise data flow tracking to prevent accidental data disclosure, ACM/IFIP/USENIX International Conference on Middleware (Middleware), 2016, Publisher: ACM
With the use of external cloud services such as Google Docsor Evernote in an enterprise setting, the loss of control oversensitive data becomes a major concern for organisations. Itis typical for regular users to violate data disclosure policiesaccidentally, e.g. when sharing text between documents inbrowser tabs. Our goal is to help such users comply with datadisclosure policies: we want to alert them about potentiallyunauthorised data disclosure from trusted to untrusted cloudservices. This is particularly challenging when users canmodify data in arbitrary ways, they employ multiple cloudservices, and cloud services cannot be changed.To track the propagation of text data robustly acrosscloud services, we introduceimprecise data ow tracking,which identi es data ows implicitly by detecting and quan-tifying the similarity between text fragments. To reasonabout violations of data disclosure policies, we describe anewtext disclosure modelthat, based on similarity, asso-ciates text fragments in web browsers with security tagsand identi es unauthorised data ows to untrusted services.We demonstrate the applicability of imprecise data trackingthroughBrowserFlow, a browser-based middleware thatalerts users when they expose potentially sensitive text to anuntrusted cloud service. Our experiments show thatBrow-serFlowcan robustly track data ows and manage securitytags for many documents with no noticeable performanceimpact.
Pietzuch PR, Brenner S, Wulf C, et al., 2016, SecureKeeper: Confidential ZooKeeper using Intel SGX, ACM/IFIP/USENIX International Conference on Middleware (Middleware), 2016, Publisher: ACM
Cloud computing, while ubiquitous, still su ers from trustissues, especially for applications managing sensitive data.Third-party coordination services such as ZooKeeper andConsul are fundamental building blocks for cloud applica-tions, but are exposed to potentially sensitive applicationdata. Recently, hardware trust mechanisms such as Intel'sSoftware Guard Extensions (SGX) o er trusted executionenvironments to shield application data from untrusted soft-ware, including the privileged Operating System (OS) andhypervisors. Such hardware support suggests new optionsfor securing third-party coordination services.We describeSecureKeeper, an enhanced version of theZooKeeper coordination service that uses SGX to pre-serve the con dentiality and basic integrity of ZooKeeper-managed data. SecureKeeper uses multiple small enclavesto ensure that (i) user-provided data in ZooKeeper is al-ways kept encrypted while not residing inside an enclave,and (ii) essential processing steps that demand plaintext ac-cess can still be performed securely. SecureKeeper limitsthe required changes to the ZooKeeper code base and re-lies on Java's native code support for accessing enclaves.With an overhead of 11%, the performance of SecureKeeperwith SGX is comparable to ZooKeeper with secure commu-nication, while providing much stronger security guaranteeswith a minimal trusted code base of a few thousand lines ofcode.
Pietzuch PR, Koliousis A, Weidlich M, et al., 2016, Demo- The SABER system for window-based hybrid stream processing with GPGPUs, DEBS 2016, Publisher: Association for Computing Machinery, Pages: 354-357
Heterogeneous architectures that combine multi-core CPUs withmany-core GPGPUs have the potential to improve the performanceof data-intensive stream processing applications. Yet, a stream pro-cessing engine must execute streaming SQL queries with sufficientdata-parallelism to fully utilise the available heterogeneous proces-sors, and decide how to use each processor in the most effectiveway. Addressing these challenges, we demonstrate SABER, ahybrid high-performance relational stream processing engine forCPUs and GPGPUs. SABER executes window-based streaming SQL queries in a data-parallel fashion and employs an adaptive scheduling strategy to balance the load on the different types of processors. To hidedata movement costs, SABER pipelines the transfer of stream databetween CPU and GPGPU memory. In this paper, we review thedesign principles of SABER in terms of its hybrid stream processingmodel and its architecture for query execution. We also present aweb front-end that monitors processing throughput.
Kalyvianaki E, Fiscato M, Salonidis T, et al., 2016, THEMIS: Fairness in Federated Stream Processing under Overload, 2016 International Conference on Management of Data (SIGMOD '16), Publisher: ACM, Pages: 541-553
Federated stream processing systems, which utilise nodes from multiple independent domains, can be found increasingly in multi-provider cloud deployments, internet-of-things systems, collaborative sensing applications and large-scale grid systems. To pool resources from several sites and take advantage of local processing, submitted queries are split into query fragments, which are executed collaboratively by different sites. When supporting many concurrent users, however, queries may exhaust available processing resources, thus requiring constant load shedding. Given that individual sites have autonomy over how they allocate query fragments on their nodes, it is an open challenge how to ensure global fairness on processing quality experienced by queries in a federated scenario.We describe THEMIS, a federated stream processing system for resource-starved, multi-site deployments. It executes queries in a globally fair fashion and provides users with constant feedback on the experienced processing quality for their queries. THEMIS associates stream data with its source information content (SIC), a metric that quantifies the contribution of that data towards the query result, based on the amount of source data used to generate it. We provide the BALANCE-SIC distributed load shedding algorithm that balances the SIC values of result data. Our evaluation shows that the BALANCE-SIC algorithm yields balanced SIC values across queries, as measured by Jain's Fairness Index. Our approach also incurs a low execution time overhead.
Koliousis A, Weidlich M, Fernandez R, et al., 2016, Saber: Window-based Hybrid Stream Processing for Heterogeneous Architectures, 2016 ACM SIGMOD/PODS Conference, Publisher: ACM
Modern servers have become heterogeneous, often combining multicoreCPUs with many-core GPGPUs. Such heterogeneous architectureshave the potential to improve the performance of data-intensivestream processing applications, but they are not supported by currentrelational stream processing engines. For an engine to exploit aheterogeneous architecture, it must execute streaming SQL querieswith sufficient data-parallelism to fully utilise all available heterogeneousprocessors, and decide how to use each in the most effectiveway. It must do this while respecting the semantics of streamingSQL queries, in particular with regard to window handling.We describe SABER, a hybrid high-performance relational streamprocessing engine for CPUs and GPGPUs. SABER executes windowbasedstreaming SQL queries in a data-parallel fashion using allavailable CPU and GPGPU cores. Instead of statically assigningquery operators to heterogeneous processors, SABER employs anew adaptive heterogeneous lookahead scheduling strategy, whichincreases the share of queries executing on the processor that yieldsthe highest performance. To hide data movement costs, SABERpipelines the transfer of stream data between different memory typesand the CPU/GPGPU. Our experimental comparison against state-ofthe-artengines shows that SABER increases processing throughputwhile maintaining low latency for a wide range of streaming SQLqueries with small and large windows sizes.
Alim A, Clegg RG, Mai L, et al., 2016, FLICK: developing and running application-specific network services, 2016 USENIX Annual Technical Conference (USENIX ATC '16), Publisher: USENIX Association
Data centre networks are increasingly programmable,with application-specific network services proliferating,from custom load-balancers to middleboxes providingcaching and aggregation. Developers must currently implementthese services using traditional low-level APIs,which neither support natural operations on applicationdata nor provide efficient performance isolation.We describe FLICK, a framework for the programmingand execution of application-specific network serviceson multi-core CPUs. Developers write network servicesin the FLICK language, which offers high-level processingconstructs and application-relevant data types.FLICK programs are translated automatically to efficient,parallel task graphs, implemented in C++ on top of auser-space TCP stack. Task graphs have bounded resourceusage at runtime, which means that the graphsof multiple services can execute concurrently withoutinterference using cooperative scheduling. We evaluateFLICK with several services (an HTTP load-balancer,a Memcached router and a Hadoop data aggregator),showing that it achieves good performance while reducingdevelopment effort.
Cherkasova L, Pietzuch P, Wang CL, 2016, Message from the IC2E 2016 program commitee co-chairs, IC2E, Pages: x-xi
Fernandez RC, Garefalakis P, Pietzuch P, 2016, Java2SDG: Stateful big data processing for the masses, 32nd IEEE International Conference on Data Engineering (ICDE), Publisher: IEEE, Pages: 1390-1393
Big data processing is no longer restricted to specially-trained engineers. Instead, domain experts, data scientists and data users all want to benefit from applying data mining and machine learning algorithms at scale. A considerable obstacle towards this “democratisation of big data” are programming models: current scalable big data processing platforms such as Spark, Naiad and Flink require users to learn custom functional or declarative programming models, which differ fundamentally from popular languages such as Java, Matlab, Python or C++. An open challenge is how to provide a big data programming model for users that are not familiar with functional programming, while maintaining performance, scalability and fault tolerance. We describe JAVA2SDG, a compiler that translates annotated Java programs to stateful dataflow graphs (SDGs) that can execute on a compute cluster in a data-parallel and fault-tolerant fashion. Compared to existing distributed dataflow models, a distinguishing feature of SDGs is that their computational tasks can access distributed mutable state, thus allowing SDGs to capture the semantics of stateful Java programs. As part of the demonstration, we provide examples of machine learning programs in Java, including collaborative filtering and logistic regression, and we explain how they are translated to SDGs and executed on a large set of machines.
Cadar C, Keeton K, Pietzuch P, et al., 2016, Proceedings of the 11th European Conference on Computer Systems, EuroSys 2016: Preface, European Conference on Computer Systems (EuroSys 2016)
Goeschka KM, Oliveira R, Pietzuch P, et al., 2016, Special track on dependable and adaptive distributed systems, ACM SAC, Pages: 490-491
Baguena M, Pamboris A, Pietzuch PR, et al., 2016, Towards Enabling Hyper-Responsive Mobile Apps Through Network Edge Assistance, The 13th Annual IEEE Consumer Communications & Networking Conference, Publisher: IEEE
Poor Internet performance currently underminesthe efficiency of hyper-responsive mobile apps such as augmentedreality clients and online games, which require low-latency accessto real-time backend services. While edge-assisted execution, i.e.moving entire services to the edge of an access network, helpseliminate part of the communication overhead involved, this doesnot scale to the number of users that share an edge infrastructure.This is due to a mismatch between the scarce availability ofresources in access networks and the aggregate demand forcomputational power from client applications.Instead, this paper proposes a hybrid edge-assisted deploymentmodel in which only part of a service executes on LTE edgeservers. We provide insights about the conditions that must holdfor such a model to be effective by investigating in simulationdifferent deployment and application scenarios. In particular,we show that using LTE edge servers with modest capabilities,performance can improve significantly as long as at most 50%of client requests are processed at the edge. Moreover, we arguethat edge servers should be installed at the core of a mobilenetwork, rather than the mobile base station: the difference inperformance is negligible, whereas the latter choice entails highdeployment costs. Finally, we verify that, for the proposed model,the impact of user mobility on TCP performance is low.
Pamboris A, Pietzuch P, 2015, C-RAM: breaking mobile device memory barriers using the cloud, IEEE Transactions on Mobile Computing, Vol: 15, Pages: 2692-2705, ISSN: 1558-0660
—Mobile applications are constrained by the available memory of mobile devices. We present C-RAM, a system thatuses cloud-based memory to extend the memory of mobile devices. It splits application state and its associated computationbetween a mobile device and a cloud node to allow applications to consume more memory, while minimising the performanceimpact. C-RAM thus enables developers to realise new applications or port legacy desktop applications with a large memoryfootprint to mobile platforms without explicitly designing them to account for memory limitations. To handle network failures withpartitioned application state, C-RAM uses a new snapshot-based fault tolerance mechanism in which changes to remote memoryobjects are periodically backed up to the device. After failure, or when network usage exceeds a given limit, the device rolls backexecution to continue from the last snapshot. C-RAM supports local execution with an application state that exceeds the availabledevice memory through a user-level virtual memory: objects are loaded on-demand from snapshots in flash memory. Our C-RAMprototype supports Objective-C applications on the unmodified iOS platform. With C-RAM, applications can consume 10× morememory than the device capacity, with a negligible impact on application performance. In some cases, C-RAM even achieves asignificant speed-up in execution time (up to 9.7×).
Ogden, Thomas D, Pietzuch P, 2016, AT-GIS: highly parallel spatial query processing with associative transducers, ACM SIGMOD International Conference on Management of Data 2016, Publisher: ACM
Users in many domains, including urban planning, transportation,and environmental science want to execute analytical queries overcontinuously updated spatial datasets. Current solutions for largescalespatial query processing either rely on extensions to RDBMS,which entails expensive loading and indexing phases when thedata changes, or distributed map/reduce frameworks, running onresource-hungry compute clusters. Both solutions struggle with thesequential bottleneck of parsing complex, hierarchical spatial dataformats, which frequently dominates query execution time. Ourgoal is to fully exploit the parallelism offered by modern multicoreCPUs for parsing and query execution, thus providing theperformance of a cluster with the resources of a single machine.We describe AT-GIS, a highly-parallel spatial query processingsystem that scales linearly to a large number of CPU cores. ATGISintegrates the parsing and querying of spatial data using a newcomputational abstraction called associative transducers(ATs). ATscan form a single data-parallel pipeline for computation withoutrequiring the spatial input data to be split into logically independentblocks. Using ATs, AT-GIS can execute, in parallel, spatial queryoperators on the raw input data in multiple formats, without anypre-processing. On a single 64-core machine, AT-GIS provides 3×the performance of an 8-node Hadoop cluster with 192 cores forcontainment queries, and 10× for aggregation queries.
O'Keeffe D, Salonidis T, Pietzuch PR, 2015, Network-Aware Stream Query Processing in Mobile Ad-Hoc Networks, MILCOM 2015, Publisher: IEEE, Pages: 1335-1340
Many real-time decision support and sensing applicationscan be expressed as continuous stream queries overtime-varying data streams, following a data stream managementmodel. We consider the problem of the efficient and resilientexecution of continuous stream queries in tactical edge networksformed from mobile ad-hoc networks (MANETs) withlimited backend connectivity. Previous approaches for distributedstream query execution target data center environments in whichnetworks are static, and centralized control is feasible. Thedistributed, bandwidth-constrained and highly dynamic natureof MANETs render such approaches insufficient—while a streamquery executes in a MANET, changes in the network topologymean that any fixed query plan eventually becomes outdated.We introduce an adaptive, network-aware approach for streamquery planning in MANETs, which supports both single- andmulti-input windowed stream query operators. The basic idea isto increase the path diversity available when executing streamqueries by replicating query operators across many nodes in theMANET. During execution, it becomes possible to dynamicallyswitch between different operator replicas based on connectivityand other network path conditions. We evaluate our approach inemulated MANETs, showing that it can increase substantially therobustness of distributed stream query processing under mobility.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.