108 results found
Minto L, Haller M, Haddadi H, et al., 2021, Stronger privacy for federated collaborative filtering with implicit feedback, 15th ACM Conference on Recommender Systems
Recommender systems are commonly trained on centrally collected userinteraction data like views or clicks. This practice however raises seriousprivacy concerns regarding the recommender's collection and handling ofpotentially sensitive data. Several privacy-aware recommender systems have beenproposed in recent literature, but comparatively little attention has beengiven to systems at the intersection of implicit feedback and privacy. Toaddress this shortcoming, we propose a practical federated recommender systemfor implicit data under user-level local differential privacy (LDP). Theprivacy-utility trade-off is controlled by parameters $\epsilon$ and $k$,regulating the per-update privacy budget and the number of $\epsilon$-LDPgradient updates sent by each user respectively. To further protect the user'sprivacy, we introduce a proxy network to reduce the fingerprinting surface byanonymizing and shuffling the reports before forwarding them to therecommender. We empirically demonstrate the effectiveness of our framework onthe MovieLens dataset, achieving up to Hit Ratio with K=10 (HR@10) 0.68 on 50kusers with 5k items. Even on the full dataset, we show that it is possible toachieve reasonable utility with HR@10>0.5 without compromising user privacy.
Mo F, Haddadi H, Katevas K, et al., 2021, PPFL: privacy-preserving federated learning with trusted execution environments, Mobile Systems, Applications, and Services conference, Publisher: ACM, Pages: 94-108
We propose and implement a Privacy-preserving Federated Learning (PPFL)framework for mobile systems to limit privacy leakages in federated learning.Leveraging the widespread presence of Trusted Execution Environments (TEEs) inhigh-end and mobile devices, we utilize TEEs on clients for local training, andon servers for secure aggregation, so that model/gradient updates are hiddenfrom adversaries. Challenged by the limited memory size of current TEEs, weleverage greedy layer-wise training to train each model's layer inside thetrusted area until its convergence. The performance evaluation of ourimplementation shows that PPFL can significantly improve privacy whileincurring small system overheads at the client-side. In particular, PPFL cansuccessfully defend the trained model against data reconstruction, propertyinference, and membership inference attacks. Furthermore, it can achievecomparable model utility with fewer communication rounds (0.54x) and a similaramount of network traffic (1.002x) compared to the standard federated learningof a complete model. This is achieved while only introducing up to ~15% CPUtime, ~18% memory usage, and ~21% energy consumption overhead in PPFL'sclient-side.
Mandalari AM, Dubois DJ, Kolcun R, et al., 2021, Blocking without breaking: identification and mitigation ofnon-essential IoT traffic, Publisher: arXiv
Despite the prevalence of Internet of Things (IoT) devices, there is littleinformation about the purpose and risks of the Internet traffic these devicesgenerate, and consumers have limited options for controlling those risks. A keyopen question is whether one can mitigate these risks by automatically blockingsome of the Internet connections from IoT devices, without rendering thedevices inoperable. In this paper, we address this question by developing arigorous methodology that relies on automated IoT-device experimentation toreveal which network connections (and the information they expose) areessential, and which are not. We further develop strategies to automaticallyclassify network traffic destinations as either required (i.e., their trafficis essential for devices to work properly) or not, hence allowing firewallrules to block traffic sent to non-required destinations without breaking thefunctionality of the device. We find that indeed 16 among the 31 devices wetested have at least one blockable non-required destination, with the maximumnumber of blockable destinations for a device being 11. We further analyze thedestination of network traffic and find that all third parties observed in ourexperiments are blockable, while first and support parties are neitheruniformly required or non-required. Finally, we demonstrate the limitations ofexisting blocklists on IoT traffic, propose a set of guidelines forautomatically limiting non-essential IoT traffic, and we develop a prototypesystem that implements these guidelines.
Aloufi R, Haddadi H, Boyle D, 2021, Configurable privacy-preserving automatic speech recognition, Publisher: arXiv
Voice assistive technologies have given rise to far-reaching privacy andsecurity concerns. In this paper we investigate whether modular automaticspeech recognition (ASR) can improve privacy in voice assistive systems bycombining independently trained separation, recognition, and discretizationmodules to design configurable privacy-preserving ASR systems. We evaluateprivacy concerns and the effects of applying various state-of-the-arttechniques at each stage of the system, and report results using task-specificmetrics (i.e. WER, ABX, and accuracy). We show that overlapping speech inputsto ASR systems present further privacy concerns, and how these may be mitigatedusing speech separation and optimization techniques. Our discretization moduleis shown to minimize paralinguistics privacy leakage from ASR acoustic modelsto levels commensurate with random guessing. We show that voice privacy can beconfigurable, and argue this presents new opportunities for privacy-preservingapplications incorporating ASR.
Zhan Y, Haddadi H, 2021, MoSen: activity modelling in multiple-occupancy smart homes, Publisher: arXiv
Smart home solutions increasingly rely on a variety of sensors for behavioralanalytics and activity recognition to provide context-aware applications andpersonalized care. Optimizing the sensor network is one of the most importantapproaches to ensure classification accuracy and the system's efficiency.However, the trade-off between the cost and performance is often a challenge inreal deployments, particularly for multiple-occupancy smart homes or carehomes. In this paper, using real indoor activity and mobility traces, floor plans,and synthetic multi-occupancy behavior models, we evaluate severalmulti-occupancy household scenarios with 2-5 residents. We explore and quantifythe trade-offs between the cost of sensor deployments and expected labelingaccuracy in different scenarios. Our evaluation across different scenarios showthat the performance of the desired context-aware task is affected by differentlocalization resolutions, the number of residents, the number of sensors, andvarying sensor deployments. To aid in accelerating the adoption of practicalsensor-based activity recognition technology, we design MoSen, a framework tosimulate the interaction dynamics between sensor-based environments andmultiple residents. By evaluating the factors that affect the performance ofthe desired sensor network, we provide a sensor selection strategy and designmetrics for sensor layout in real environments. Using our selection strategy ina 5-person scenario case study, we demonstrate that MoSen can significantlyimprove overall system performance without increasing the deployment costs.
Kolcun R, Popescu DA, Safronov V, et al., 2020, The case for retraining of ML models for IoT device identification at the edge, Publisher: arXiv
Internet-of-Things (IoT) devices are known to be the source of many securityproblems, and as such they would greatly benefit from automated management.This requires robustly identifying devices so that appropriate network securitypolicies can be applied. We address this challenge by exploring how toaccurately identify IoT devices based on their network behavior, usingresources available at the edge of the network. In this paper, we compare the accuracy of five different machine learningmodels (tree-based and neural network-based) for identifying IoT devices byusing packet trace data from a large IoT test-bed, showing that all models needto be updated over time to avoid significant degradation in accuracy. In orderto effectively update the models, we find that it is necessary to use datagathered from the deployment environment, e.g., the household. We thereforeevaluate our approach using hardware resources and data sources representativeof those that would be available at the edge of the network, such as in an IoTdeployment. We show that updating neural network-based models at the edge isfeasible, as they require low computational and memory resources and theirstructure is amenable to being updated. Our results show that it is possible toachieve device identification and categorization with over 80% and 90% accuracyrespectively at the edge.
Aloufi R, Haddadi H, Boyle D, 2020, Privacy-preserving Voice Analysis via Disentangled Representations, Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop
Saidi SJ, Mandalari AM, Kolcun R, et al., 2020, A haystack full of needles: scalable detection of IoT devices in the wild, Publisher: arXiv
Consumer Internet of Things (IoT) devices are extremely popular, providingusers with rich and diverse functionalities, from voice assistants to homeappliances. These functionalities often come with significant privacy andsecurity risks, with notable recent large scale coordinated global attacksdisrupting large service providers. Thus, an important first step to addressthese risks is to know what IoT devices are where in a network. While somelimited solutions exist, a key question is whether device discovery can be doneby Internet service providers that only see sampled flow statistics. Inparticular, it is challenging for an ISP to efficiently and effectively trackand trace activity from IoT devices deployed by its millions of subscribers--all with sampled network data. In this paper, we develop and evaluate a scalable methodology to accuratelydetect and monitor IoT devices at subscriber lines with limited, highly sampleddata in-the-wild. Our findings indicate that millions of IoT devices aredetectable and identifiable within hours, both at a major ISP as well as anIXP, using passive, sparsely sampled network flow headers. Our methodology isable to detect devices from more than 77% of the studied IoT manufacturers,including popular devices such as smart speakers. While our methodology iseffective for providing network analytics, it also highlights significantprivacy consequences.
Siracusano G, Galea S, Sanvito D, et al., 2020, Running neural networks on the NIC, Publisher: arXiv
In this paper we show that the data plane of commodity programmable (NetworkInterface Cards) NICs can run neural network inference tasks required by packetmonitoring applications, with low overhead. This is particularly important asthe data transfer costs to the host system and dedicated machine learningaccelerators, e.g., GPUs, can be more expensive than the processing taskitself. We design and implement our system -- N3IC -- on two different NICs andwe show that it can greatly benefit three different network monitoring usecases that require machine learning inference as first-class-primitive. N3ICcan perform inference for millions of network flows per second, whileforwarding traffic at 40Gb/s. Compared to an equivalent solution implemented ona general purpose CPU, N3IC can provide 100x lower processing latency, with1.5x increase in throughput.
Osia SA, Shahin Shamsabadi A, Sajadmanesh S, et al., 2020, A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics, IEEE INTERNET OF THINGS JOURNAL, Vol: 7, Pages: 4505-4518, ISSN: 2327-4662
Lisi E, Malekzadeh M, Haddadi H, et al., 2020, Modelling and forecasting art movements with CGANs, Publisher: ROYAL SOC
Shamsabadi AS, Gascon A, Haddadi H, et al., 2020, PrivEdge: from local to distributed private training and prediction, IEEE Transactions on Information Forensics and Security, Vol: 15, Pages: 3819-3831, ISSN: 1556-6013
Machine Learning as a Service (MLaaS) operators provide model training and prediction on the cloud. MLaaS applications often rely on centralised collection and aggregation of user data, which could lead to significant privacy concerns when dealing with sensitive personal data. To address this problem, we propose PrivEdge, a technique for privacy-preserving MLaaS that safeguards the privacy of users who provide their data for training, as well as users who use the prediction service. With PrivEdge, each user independently uses their private data to locally train a one-class reconstructive adversarial network that succinctly represents their training data. As sending the model parameters to the service provider in the clear would reveal private information, PrivEdge secret-shares the parameters among two non-colluding MLaaS providers, to then provide cryptographically private prediction services through secure multi-party computation techniques. We quantify the benefits of PrivEdge and compare its performance with state-of-the-art centralised architectures on three privacy-sensitive image-based tasks: individual identification, writer identification, and handwritten letter recognition. Experimental results show that PrivEdge has high precision and recall in preserving privacy, as well as in distinguishing between private and non-private images. Moreover, we show the robustness of PrivEdge to image compression and biased training data. The source code is available at https://github.com/smartcameras/PrivEdge.
Mo F, Shamsabadi AS, Katevas K, et al., 2020, DarkneTZ: towards model privacy at the edge using trusted execution environments, Publisher: arXiv
We present DarkneTZ, a framework that uses an edge device's Trusted ExecutionEnvironment (TEE) in conjunction with model partitioning to limit the attacksurface against Deep Neural Networks (DNNs). Increasingly, edge devices(smartphones and consumer IoT devices) are equipped with pre-trained DNNs for avariety of applications. This trend comes with privacy risks as models can leakinformation about their training data through effective membership inferenceattacks (MIAs). We evaluate the performance of DarkneTZ, including CPUexecution time, memory usage, and accurate power consumption, using two smalland six large image classification models. Due to the limited memory of theedge device's TEE, we partition model layers into more sensitive layers (to beexecuted inside the device TEE), and a set of layers to be executed in theuntrusted part of the operating system. Our results show that even if a singlelayer is hidden, we can provide reliable model privacy and defend against stateof the art MIAs, with only 3% performance overhead. When fully utilizing theTEE, DarkneTZ provides model protections with up to 10% overhead.
Conditional generative adversarial networks (CGANs) are a recent and popular method for generating samples from a probability distribution conditioned on latent information. The latent information often comes in the form of a discrete label from a small set. We propose a novel method for training CGANs which allows us to condition on a sequence of continuous latent distributions f(1), …, f(K). This training allows CGANs to generate samples from a sequence of distributions. We apply our method to paintings from a sequence of artistic movements, where each movement is considered to be its own distribution. Exploiting the temporal aspect of the data, a vector autoregressive (VAR) model is fitted to the means of the latent distributions that we learn, and used for one-step-ahead forecasting, to predict the latent distribution of a future art movement f(K+1). Realizations from this distribution can be used by the CGAN to generate ‘future’ paintings. In experiments, this novel methodology generates accurate predictions of the evolution of art. The training set consists of a large dataset of past paintings. While there is no agreement on exactly what current art period we find ourselves in, we test on plausible candidate sets of present art, and show that the mean distance to our predictions is small.
Malekzadeh M, Clegg RG, Cavallaro A, et al., 2020, Privacy and utility preserving sensor-data transformations, Pervasive and Mobile Computing, Vol: 63, Pages: 1-13, ISSN: 1574-1192
Sensitive inferences and user re-identification are major threats to privacywhen raw sensor data from wearable or portable devices are shared withcloud-assisted applications. To mitigate these threats, we propose mechanismsto transform sensor data before sharing them with applications running onusers' devices. These transformations aim at eliminating patterns that can beused for user re-identification or for inferring potentially sensitiveactivities, while introducing a minor utility loss for the target application(or task). We show that, on gesture and activity recognition tasks, we canprevent inference of potentially sensitive activities while keeping thereduction in recognition accuracy of non-sensitive activities to less than 5percentage points. We also show that we can reduce the accuracy of userre-identification and of the potential inference of gender to the level of arandom guess, while keeping the accuracy of activity recognition comparable tothat obtained on the original data.
We present and evaluate Deep Private-Feature Extractor (DPFE), a deep model which is trained and evaluated based on information theoretic constraints. Using the selective exchange of information between a user's device and a service provider, DPFE enables the user to prevent certain sensitive information from being shared with a service provider, while allowing them to extract approved information using their model. We introduce and utilize the log-rank privacy, a novel measure to assess the effectiveness of DPFE in removing sensitive information and compare different models based on their accuracy-privacy trade-off. We then implement and evaluate the performance of DPFEon smartphones to understand its complexity, resource demands, and efficiency trade-offs. Our results on benchmark image datasets demonstrate that under moderate resource utilization, DPFE can achieve high accuracy for primary tasks while preserving the privacy of sensitive information.
Zhao Y, Haddadi H, Skillman S, et al., 2020, Privacy-preserving Activity and Health Monitoring on Databox, 3rd ACM International Workshop on Edge Systems, Analytics and Networking (EdgeSys), Publisher: ASSOC COMPUTING MACHINERY, Pages: 49-54
Ren J, Dubois DJ, Choffnes D, et al., 2019, Information exposure from consumer IoT devices: a multidimensional, network-informed measurement approach, ACM Internet Measurement Conference (IMC), Publisher: ASSOC COMPUTING MACHINERY, Pages: 267-279
Internet of Things (IoT) devices are increasingly found in everyday homes, providing useful functionality for devices such as TVs, smart speakers, and video doorbells. Along with their benefits come potential privacy risks, since these devices can communicate information about their users to other parties over the Internet. However, understanding these risks in depth and at scale is difficult due to heterogeneity in devices' user interfaces, protocols, and functionality.In this work, we conduct a multidimensional analysis of information exposure from 81 devices located in labs in the US and UK. Through a total of 34,586 rigorous automated and manual controlled experiments, we characterize information exposure in terms of destinations of Internet traffic, whether the contents of communication are protected by encryption, what are the IoT-device interactions that can be inferred from such content, and whether there are unexpected exposures of private and/or sensitive information (e.g., video surreptitiously transmitted by a recording device). We highlight regional differences between these results, potentially due to different privacy regulations in the US and UK. Last, we compare our controlled experiments with data gathered from an in situ user study comprising 36 participants.
Aloufi R, Haddadi H, Boyle D, 2019, Emotion filtering at the edge, Publisher: arXiv
Voice controlled devices and services have become very popular in theconsumer IoT. Cloud-based speech analysis services extract information fromvoice inputs using speech recognition techniques. Services providers can thusbuild very accurate profiles of users' demographic categories, personalpreferences, emotional states, etc., and may therefore significantly compromisetheir privacy. To address this problem, we have developed a privacy-preservingintermediate layer between users and cloud services to sanitize voice inputdirectly at edge devices. We use CycleGAN-based speech conversion to removesensitive information from raw voice input signals before regeneratingneutralized signals for forwarding. We implement and evaluate our emotionfiltering approach using a relatively cheap Raspberry Pi 4, and show thatperformance accuracy is not compromised at the edge. In fact, signals generatedat the edge differ only slightly (~0.16%) from cloud-based approaches forspeech recognition. Experimental evaluation of generated signals show thatidentification of the emotional state of a speaker can be reduced by ~91%.
Aloufi R, Haddadi H, Boyle D, 2019, Emotionless: privacy-preserving speech analysis for voice assistants, Publisher: arXiv
Voice-enabled interactions provide more human-like experiences in manypopular IoT systems. Cloud-based speech analysis services extract usefulinformation from voice input using speech recognition techniques. The voicesignal is a rich resource that discloses several possible states of a speaker,such as emotional state, confidence and stress levels, physical condition, age,gender, and personal traits. Service providers can build a very accurateprofile of a user's demographic category, personal preferences, and maycompromise privacy. To address this problem, a privacy-preserving intermediatelayer between users and cloud services is proposed to sanitize the voice input.It aims to maintain utility while preserving user privacy. It achieves this bycollecting real time speech data and analyzes the signal to ensure privacyprotection prior to sharing of this data with services providers. Precisely,the sensitive representations are extracted from the raw signal by usingtransformation functions and then wrapped it via voice conversion technology.Experimental evaluation based on emotion recognition to assess the efficacy ofthe proposed method shows that identification of sensitive emotional state ofthe speaker is reduced by ~96 %.
Malekzadeh M, Clegg RG, Cavallaro A, et al., 2019, Mobile sensor data anonymization, ACM/IEEE International Conference on Internet of Things Design and Implementation (IoTDI 2019), Publisher: ACM, Pages: 49-58
Data from motion sensors such as accelerometers and gyroscopes embedded inour devices can reveal secondary undesired, private information about ouractivities. This information can be used for malicious purposes such as useridentification by application developers. To address this problem, we propose adata transformation mechanism that enables a device to share data for specificapplications (e.g.~monitoring their daily activities) without revealing privateuser information (e.g.~ user identity). We formulate this anonymization processbased on an information theoretic approach and propose a new multi-objectiveloss function for training convolutional auto-encoders~(CAEs) to provide apractical approximation to our anonymization problem. This effective lossfunction forces the transformed data to minimize the information about theuser's identity, as well as the data distortion to preserveapplication-specific utility. Our training process regulates the encoder todisregard user-identifiable patterns and tunes the decoder to shape the finaloutput independently of users in the training set. Then, a trained CAE can bedeployed on a user's mobile device to anonymize sensor data before sharing withan app, even for users who are not included in the training dataset. Theresults, on a dataset of 24 users for activity recognition, show a promisingtrade-off on transformed data between utility and privacy, with an accuracy foractivity recognition over 92%, while reducing the chance of identifying a userto less than 7%.
Moore J, Arcia-Moret A, Yadav P, et al., 2019, Zest: REST over ZeroMQ, 2019 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS WORKSHOPS (PERCOM WORKSHOPS), Pages: 1015-1019, ISSN: 2474-2503
Zhang C, Patras P, Haddadi H, 2019, Deep Learning in Mobile and Wireless Networking: A Survey, IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, Vol: 21, Pages: 2224-2287
Mo F, Shamsabadi AS, Katevas K, et al., 2019, Poster: Towards Characterizing and Limiting Information Exposure in DNN Layers, ACM SIGSAC Conference on Computer and Communications Security (CCS), Publisher: ASSOC COMPUTING MACHINERY, Pages: 2653-2655
Zhan Y, Haddadi H, 2019, Activity Prediction for Improving Well-Being of Both The Elderly and Caregivers, ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp) / ACM International Symposium on Wearable Computers (ISWC), Publisher: ASSOC COMPUTING MACHINERY, Pages: 1214-1217
Zhan Y, Haddadi H, 2019, Activity Prediction for Mapping Contextual-Temporal Dynamics, ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp) / ACM International Symposium on Wearable Computers (ISWC), Publisher: ASSOC COMPUTING MACHINERY, Pages: 246-249
Zhan Y, Haddadi H, 2019, Towards Automating Smart Homes: Contextual and Temporal Dynamics of Activity Prediction, ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp) / ACM International Symposium on Wearable Computers (ISWC), Publisher: ASSOC COMPUTING MACHINERY, Pages: 413-417
Varvello M, Katevas K, Plesa M, et al., 2019, BatteryLab, A Distributed Power Monitoring Platform For Mobile Devices, 18th ACM Workshop on Hot Topics in Networks (HotNets), Publisher: ASSOC COMPUTING MACHINERY, Pages: 101-108
Osia SA, Rassouli B, Haddadi H, et al., 2019, Privacy Against Brute-Force Inference Attacks, Publisher: IEEE
Katevas K, Hansel K, Clegg R, et al., 2019, Finding Dory in the Crowd: Detecting Social Interactions using Multi-Modal Mobile Sensing, SENSYS-ML'19: PROCEEDINGS OF THE FIRST WORKSHOP ON MACHINE LEARNING ON EDGE IN SENSOR SYSTEMS, Pages: 37-42
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.