Professor Hamed Haddadi

Faculty of Engineering, Department of Computing

Professor of Human-Centred Systems

Contact

h.haddadi Website

Location

2Translation & Innovation Hub BuildingWhite City Campus

Summary

Publications

Huang Y, Zhao Y, Capstick A, Palermo F, Haddadi H, Barnaghi Pet al., 2024, Analyzing entropy features in time-series data for pattern recognition in neurological conditions., Artif Intell Med, Vol: 150

In the field of medical diagnosis and patient monitoring, effective pattern recognition in neurological time-series data is essential. Traditional methods predominantly based on statistical or probabilistic learning and inference often struggle with multivariate, multi-source, state-varying, and noisy data while also posing privacy risks due to excessive information collection and modeling. Furthermore, these methods often overlook critical statistical information, such as the distribution of data points and inherent uncertainties. To address these challenges, we introduce an information theory-based pipeline that leverages specialized features to identify patterns in neurological time-series data while minimizing privacy risks. We incorporate various entropy methods based on the characteristics of different scenarios and entropy. For stochastic state transition applications, we incorporate Shannon's entropy, entropy rates, entropy production, and the von Neumann entropy of Markov chains. When state modeling is impractical, we select and employ approximate entropy, increment entropy, dispersion entropy, phase entropy, and slope entropy. The pipeline's effectiveness and scalability are demonstrated through pattern analysis in a dementia care dataset and also an epileptic and a myocardial infarction dataset. The results indicate that our information theory-based pipeline can achieve average performance improvements across various models on the recall rate, F1 score, and accuracy by up to 13.08 percentage points, while enhancing inference efficiency by reducing the number of model parameters by an average of 3.10 times. Thus, our approach opens a promising avenue for improved, efficient, and critical statistical information-considered pattern recognition in medical time-series data.

Journal article

McQuistin S, Snyder P, Perkins C, Haddadi H, Tyson Get al., 2023, A First Look at the Privacy Harms of the Public Suffix List, Pages: 383-390

The public suffix list is a community-maintained list of rules that can be applied to domain names to determine how they should be grouped into logical organizations or companies. We present the first large-scale measurement study of how the public suffix list is used by open-source software on the Web and the privacy harm resulting from projects using outdated versions of the list. We measure how often developers include out-of-date versions of the public suffix list in their projects, how old included lists are, and estimate the real-world privacy harm with a model based on a large-scale crawl of the Web. We find that incorrect use of the public suffix list is common in open-source software, and that at least 43 open-source projects use hard-coded, outdated versions of the public suffix list. These include popular, security-focused projects, such as password managers and digital forensics tools. We also estimate that, because of these out-of-date lists, these projects make incorrect privacy decisions for 1313 effective top-level domains (eTLDs), affecting 50,750 domains, by extrapolating from data gathered by the HTTP Archive project.

Abstract
Cite

Conference paper

Chouaki S, Goga O, Haddadi H, Snyder Pet al., 2023, Understanding the Privacy Risks of Popular Search Engine Advertising Systems, Pages: 370-382

We present the first extensive measurement of the privacy properties of the advertising systems used by privacy-focused search engines. We propose an automated methodology to study the impact of clicking on search ads on three popularprivate search engines which have advertising-based business models: StartPage, Qwant, and DuckDuckGo, and we compare them to two dominant data-harvesting ones: Google and Bing. We investigate the possibility of third parties tracking users when clicking on ads by analyzing first-party storage, redirection domain paths, and requests sent before, when, and after the clicks. Our results show that privacy-focused search engines fail to protect users' privacy when clicking ads. Users' requests are sent through redirectors on 4% of ad clicks on Bing, 86% of ad clicks on Qwant, and 100% of ad clicks on Google, DuckDuckGo, and StartPage. Even worse, advertising systems collude with advertisers across all search engines by passing unique IDs to advertisers in most ad clicks. These IDs allow redirectors to aggregate users' activity on ads' destination websites in addition to the activity they record when users are redirected through them. Overall, we observe that both privacy-focused and traditional search engines engage in privacy-harming behaviors allowing cross-site tracking, even in privacy-enhanced browsers.

Abstract
Cite
Citations: 1

Conference paper

Zhan Y, Haddadi H, Mashhadi A, 2023, Analysing Fairness of Privacy-Utility Mobility Models, Pages: 359-365

Preserving the individuals' privacy in sharing spatial-temporal datasets is critical to prevent re-identification attacks based on unique trajectories. Existing privacy techniques tend to propose ideal privacy-utility tradeoffs (PUT), however, largely ignore the fairness implications of mobility models and whether such techniques perform equally for different groups of users. The quantification between fairness and privacy of PUT models is still unclear and there exists limited metrics for measuring fairness in the spatial-temporal context. In this work, we define a set of fairness metrics designed explicitly for human mobility, based on structural similarity and entropy of the trajectories. Under these definitions, we examine the fairness of two state-of-the-art privacy-preserving models that rely on GAN and representation learning to reduce the re-identification rate of users. Our results show that these models violate individual fairness criteria, indicating that users with highly similar trajectories receive disparate privacy gain.

Abstract
Cite

Conference paper

Parkinson M, Doherty R, Curtis F, Soreq E, Lai HHL, Serban A-I, Dani M, Fertleman M, Barnaghi PJ, Sharp DM, Li Let al., 2023, Using home monitoring technology to study the effects of traumatic brain injury in older multimorbid adults, Annals of Clinical and Translational Neurology, Vol: 10, Pages: 1688-1694, ISSN: 2328-9503

Internet of things (IOT) based in-home monitoring systems can passively collect high temporal resolution data in the community, offering valuable insight into the impact of health conditions on patients' day-to-day lives. We used this technology to monitor activity and sleep patterns in older adults recently discharged after traumatic brain injury (TBI). The demographics of TBI are changing, and it is now a leading cause of hospitalisation in older adults. However, research in this population is minimal. We present three cases, showcasing the potential of in-home monitoring systems in understanding and managing early recovery in older adults following TBI.

Journal article

Li S, Liu P, Nascimento GG, Wang X, Leite FRM, Chakraborty B, Hong C, Ning Y, Xie F, Teo ZL, Ting DSW, Haddadi H, Ong MEH, Peres MA, Liu Net al., 2023, Federated and distributed learning applications for electronic health records and structured medical data: a scoping review, Journal of the American Medical Informatics Association, ISSN: 1067-5027

OBJECTIVES: Federated learning (FL) has gained popularity in clinical research in recent years to facilitate privacy-preserving collaboration. Structured data, one of the most prevalent forms of clinical data, has experienced significant growth in volume concurrently, notably with the widespread adoption of electronic health records in clinical practice. This review examines FL applications on structured medical data, identifies contemporary limitations, and discusses potential innovations. MATERIALS AND METHODS: We searched 5 databases, SCOPUS, MEDLINE, Web of Science, Embase, and CINAHL, to identify articles that applied FL to structured medical data and reported results following the PRISMA guidelines. Each selected publication was evaluated from 3 primary perspectives, including data quality, modeling strategies, and FL frameworks. RESULTS: Out of the 1193 papers screened, 34 met the inclusion criteria, with each article consisting of one or more studies that used FL to handle structured clinical/medical data. Of these, 24 utilized data acquired from electronic health records, with clinical predictions and association studies being the most common clinical research tasks that FL was applied to. Only one article exclusively explored the vertical FL setting, while the remaining 33 explored the horizontal FL setting, with only 14 discussing comparisons between single-site (local) and FL (global) analysis. CONCLUSIONS: The existing FL applications on structured medical data lack sufficient evaluations of clinically meaningful benefits, particularly when compared to single-site analyses. Therefore, it is crucial for future FL applications to prioritize clinical motivations and develop designs and methodologies that can effectively support and aid clinical practice and research.

Journal article

Hadjixenophontos S, Mandalari AM, Zhao Y, Haddadi Het al., 2023, PRISM: privacy preserving healthcare Internet of Things security management, 2023 IEEE Symposium on Computers and Communications (ISCC), Publisher: IEEE, ISSN: 2642-7389

Consumer healthcare Internet of Things (IoT) devices are gaining popularity in our homes and hospitals. These devices provide continuous monitoring at a low cost and can be used to augment high-precision medical equipment. However, major challenges remain in applying pre-trained global models for anomaly detection on smart health monitoring, for a diverse set of individuals that they provide care for. In this paper, we propose PRISM, an edge-based system for experimenting with in-home smart healthcare devices. We develop a rigorous methodology that relies on automated IoT experimentation. We use a rich real-world dataset from in-home patient monitoring from 44 households of People Living With Dementia (PLWD) over two years. Our results indicate that anomalies can be identified with accuracy up to 99% and mean training times as low as 0.88 seconds. While all models achieve high accuracy when trained on the same patient, their accuracy degrades when evaluated on different patients.

Conference paper

Mandalari AM, Haddadi H, Dubois DJ, Choffnes Det al., 2023, Protected or porous: a comparative analysis of threat detection capability of IoT safeguards, 44th IEEE Symposium on Security and Privacy (SP), Publisher: IEEE Computer Society, Pages: 3061-3078, ISSN: 1081-6011

Consumer Internet of Things (IoT) devices are increasingly common, from smart speakers to security cameras, in homes. Along with their benefits come potential privacy and security threats. To limit these threats a number of commercial services have become available (IoT safeguards). The safeguards claim to provide protection against IoT privacy risks and security threats. However, the effectiveness and the associated privacy risks of these safeguards remains a key open question. In this paper, we investigate the threat detection capabilities of IoT safeguards for the first time. We develop and release an approach for automated safeguards experimentation to reveal their response to common security threats and privacy risks. We perform thousands of automated experiments using popular commercial IoT safeguards when deployed in a large IoT testbed. Our results indicate not only that these devices may be ineffective in preventing risks, but also their cloud interactions and data collection operations may introduce privacy risks for the households that adopt them.

Conference paper

Huang Y, Haddadi H, 2023, Poster: Towards Battery-Free Machine Learning Inference and Model Personalization on MCUs, Pages: 571-572

Machine learning (ML) is moving towards edge devices. However, ML models with high computational demands and energy consumption pose challenges for ML inference in resource-constrained environments, such as the deep sea. To address these challenges, we propose a battery-free ML inference and model personalization pipeline for microcontroller units (MCUs). As an example, we performed fish image recognition in the ocean. We evaluated and compared the accuracy, runtime, power, and energy consumption of the model before and after optimization. The results demonstrate that, our pipeline can achieve 97.78% accuracy with 483.82 KB Flash, 70.32 KB RAM, 118 ms runtime, 4.83 mW power, and 0.57 mJ energy consumption on MCUs, reducing by 64.17%, 12.31%, 52.42%, 63.74%, and 82.67%, compared to the baseline. The results indicate the feasibility of battery-free ML inference on MCUs.

Abstract
Cite

Conference paper

Aloufi R, Haddadi H, Boyle D, 2023, Paralinguistic privacy protection at the edge, ACM Transactions on Privacy and Security, Vol: 26, Pages: 1-27, ISSN: 2471-2566

Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As our emotional patterns and sensitive attributes like our identity, gender, and well-being are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving, on-device paralinguistic information learning and filtering before transmitting voice data.In this article we introduce EDGY, a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY’s on-device performance and explore optimization techniques, including model quantization and knowledge distillation, to enable private, accurate, and efficient representation learning on resource-constrained devices. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in “zero-shot” ABX score or minimal performance penalties of approximately 5.95% word error rate (WER) in learning linguistic representations from raw voice signals, using a CPU and a single-core ARM processor without specialized hardware.

Journal article

Buchet A, Snyder P, Haddadi H, Pelsser Cet al., 2023, Detecting IP-tracking proof interfaces by looking for NATs

In this poster, we propose an approach based on short-lived random identifiers to allow applications to detect when multiple users share the same IP address such as when they are behind NATs. Using NATed interfaces could provide a cheap way to evade IP-based tracking as the traffic of all users is merged into a single IP flow. As a result, it is harder for trackers to single out (and so re-identify by IP address) users behind a NAT. For many years, there has been a race between web trackers trying to find techniques to monitor user behaviour online, and privacy researchers looking for solutions to avoid such tracking. Despite progresses in browser privacy-preserving techniques, IP tracking is still highly effective because current solutions to hide an IP address such as VPNs, or the Tor network, rely on external services and often induce a high cost in terms of performance. Our proposal could lead to solutions that are cheaper to deploy and don't affect the performance as much. We developed an Android application detecting when an IP address was shared by multiple devices and reported the availability of such interfaces. We show that it is possible to identify networks where multiple users share the same IP address. We also discuss how our system can be protected from potential attackers.

Abstract
Cite

Conference paper

Tavallaie O, Zandavi SM, Haddadi H, Zomaya AYet al., 2023, GT-TSCH: Game-Theoretic Distributed TSCH Scheduler for Low-Power IoT Networks, Pages: 475-486

Time-Slotted Channel Hopping (TSCH) is a synchronous medium access mode of the IEEE 802.15.4e standard designed for providing low-latency and highly-reliable end-To-end communication. TSCH constructs a communication schedule by combining frequency channel hopping with Time Division Multiple Access (TDMA). In recent years, IETF designed several standards to define general mechanisms for the implementation of TSCH. However, the problem of updating the TSCH schedule according to the changes of the wireless link quality and node's traffic load left unresolved. In this paper, we use non-cooperative game theory to propose GT-TSCH, a distributed TSCH scheduler designed for low-power IoT applications. By considering selfish behavior of nodes in packet forwarding, GT-TSCH updates the TSCH schedule in a distributed approach with low control overhead by monitoring the queue length, the place of the node in the Directed Acyclic Graph (DAG) topology, the quality of the wireless link, and the data packet generation rate. We prove the existence and uniqueness of Nash equilibrium in our game model and we find the optimal number of TSCH Tx timeslots to update the TSCH slotframe. To examine the performance of our contribution, we implement GT-TSCH on Zolertia Firefly IoT motes and the Contiki-NG Operating System (OS). The evaluation results reveal that GT-TSCH improves performance in terms of throughput and end-To-end delay compared to the state-of-The-Art method.

Abstract
Cite

Conference paper

Snyder P, Karami S, Edelstein A, Livshits B, Haddadi Het al., 2023, Pool-Party: Exploiting Browser Resource Pools for Web Tracking, Pages: 7091-7106

We identify class of covert channels in browsers that are not mitigated by current defenses, which we call “pool-party” attacks. Pool-party attacks allow sites to create covert channels by manipulating limited-but-unpartitioned resource pools. This class of attacks have been known to exist; in this work we show that they are more prevalent, more practical for exploitation, and allow exploitation in more ways, than previously identified. These covert channels have sufficient bandwidth to pass cookies and identifiers across site boundaries under practical and real-world conditions. We identify pool-party attacks in all popular browsers, and show they are practical cross-site tracking techniques (i.e., attacks take 0.6s in Chrome and Edge, and 7s in Firefox and Tor Browser). In this paper we make the following contributions: first, we describe pool-party covert channel attacks that exploit limits in application-layer resource pools in browsers. Second, we demonstrate that pool-party attacks are practical, and can be used to track users in all popular browsers; we also share open source implementations of the attack. Third, we show that in Gecko based-browsers (including the Tor Browser) pool-party attacks can also be used for cross-profile tracking (e.g., linking user behavior across normal and private browsing sessions). Finally, we discuss possible defenses.

Abstract
Cite

Conference paper

Davidson A, Frei M, Gartner M, Haddadi H, Nieto JS, Perrig A, Winter P, Wirz Fet al., 2022, Tango or square dance? how tightly should we integrate network functionality in browsers?, HotNets 2022: Twenty-First ACM Workshop on Hot Topics in Networks, Publisher: ACM, Pages: 205-212

The question at which layer network functionality is presented or abstractedremains a research challenge. Traditionally, network functionality was eitherplaced into the core network, middleboxes, or into the operating system -- butrecent developments have expanded the design space to directly introducefunctionality into the application (and in particular into the browser) as away to expose it to the user. Given the context of emerging path-aware networking technology, aninteresting question arises: which layer should handle the new features? Weargue that the browser is becoming a powerful platform for network innovation,where even user-driven properties can be implemented in an OS-agnostic fashion.We demonstrate the feasibility of geo-fenced browsing using a prototype browserextension, realized by the SCION path-aware networking architecture, withoutintroducing any significant performance overheads.

Conference paper

Davidson A, Snyder P, Quirk EB, Genereux J, Livshits B, Haddadi Het al., 2022, STAR: Secret sharing for private threshold aggregation reporting, CCS '22: 2022 ACM SIGSAC Conference on Computer and Communications Security, Publisher: ACM, Pages: 697-710

Threshold aggregation reporting systems promise a practical, privacy-preserving solution for developers to learn how their applications are used in-the-wild. Unfortunately, proposed systems to date prove impractical for wide scale adoption, suffering from a combination of requiring: i) prohibitive trust assumptions; ii) high computation costs; or iii) massive user bases. As a result, adoption of truly-private approaches has been limited to only a small number of enormous (and enormously costly) projects.In this work, we improve the state of private data collection by proposing STAR, a highly efficient, easily deployable system for providing cryptographically-enforced κ-anonymity protections on user data collection. The STAR protocol is easy to implement and cheap to run, all while providing privacy properties similar to, or exceeding the current state-of-the-art. Measurements of our open-source implementation of STAR find that it is 1773x quicker, requires 62.4x less communication, and is 24x cheaper to run than the existing state-of-the-art.

Conference paper

Huang Y, Zhao Y, Haddadi H, Barnaghi Pet al., 2022, Using entropy measures for monitoring the evolution of activity patterns, IEEE 8th World Forum on Internet of Things, Publisher: IEEE

In this work, we apply information theory inspired methods to quantifychanges in daily activity patterns. We use in-home movement monitoring data andshow how they can help indicate the occurrence of healthcare-related events.Three different types of entropy measures namely Shannon's entropy, entropyrates for Markov chains, and entropy production rate have been utilised. Themeasures are evaluated on a large-scale in-home monitoring dataset that hasbeen collected within our dementia care clinical study. The study uses Internetof Things (IoT) enabled solutions for continuous monitoring of in-homeactivity, sleep, and physiology to develop care and early interventionsolutions to support people living with dementia (PLWD) in their own homes. Ourmain goal is to show the applicability of the entropy measures to time-seriesactivity data analysis and to use the extracted measures as new engineeredfeatures that can be fed into inference and analysis models. The results of ourexperiments show that in most cases the combination of these measures canindicate the occurrence of healthcare-related events. We also find thatdifferent participants with the same events may have different measures basedon one entropy measure. So using a combination of these measures in aninference model will be more effective than any of the single measures.

Conference paper

Zhan Y, Haddadi H, Mashhadi A, 2022, Privacy-Aware Adversarial Network in Human Mobility Prediction, Publisher: ArXiv

As mobile devices and location-based services are increasingly developed indifferent smart city scenarios and applications, many unexpected privacyleakages have arisen due to geolocated data collection and sharing. Userre-identification and other sensitive inferences are major privacy threats whengeolocated data are shared with cloud-assisted applications. Significantly,four spatio-temporal points are enough to uniquely identify 95\% of theindividuals, which exacerbates personal information leakages. To tacklemalicious purposes such as user re-identification, we propose an LSTM-basedadversarial mechanism with representation learning to attain aprivacy-preserving feature representation of the original geolocated data(i.e., mobility data) for a sharing purpose. These representations aim tomaximally reduce the chance of user re-identification and full datareconstruction with a minimal utility budget (i.e., loss). We train themechanism by quantifying privacy-utility trade-off of mobility datasets interms of trajectory reconstruction risk, user re-identification risk, andmobility predictability. We report an exploratory analysis that enables theuser to assess this trade-off with a specific loss function and its weightparameters. The extensive comparison results on four representative mobilitydatasets demonstrate the superiority of our proposed architecture in mobilityprivacy protection and the efficiency of the proposed privacy-preservingfeatures extractor. We show that the privacy of mobility traces attains decentprotection at the cost of marginal mobility utility. Our results also show thatby exploring the Pareto optimal setting, we can simultaneously increase bothprivacy (45%) and utility (32%).

Working paper

Zhao Y, Barnaghi P, Haddadi H, 2022, Multimodal federated learning on IoT data, 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI), Publisher: IEEE

Federated learning is proposed as an alternative to centralized machine learning since its client-server structure provides better privacy protection and scalability in real-world applications. In many applications, such as smart homes with Internet-of-Things (IoT) devices, local data on clients are generated from different modalities such as sensory, visual, and audio data. Existing federated learning systems only work on local data from a single modality, which limits the scalability of the systems. In this paper, we propose a multimodal and semi-supervised federated learning framework that trains autoencoders to extract shared or correlated representations from different local data modalities on clients. In addition, we propose a multimodal FedAvg algorithm to aggregate local autoencoders trained on different data modalities. We use the learned global autoencoder for a downstream classification task with the help of auxiliary labelled data on the server. We empirically evaluate our framework on different modalities including sensory data, depth camera videos, and RGB camera videos. Our experimental results demonstrate that introducing data from multiple modalities into federated learning can improve its classification performance. In addition, we can use labelled data from only one modality for supervised learning on the server and apply the learned model to testing data from other modalities to achieve decent F1 scores (e.g., with the best performance being higher than 60%), especially when combining contributions from both unimodal clients and multimodal clients.

Conference paper

Varvello M, Katevas K, Plesa M, Haddadi H, Bustamante F, Livshits Bet al., 2022, BatteryLab: A collaborative platform for power monitoring https://batterylab.dev, 23rd Annual International Passive and Active Measurement (PAM) Conference, Publisher: Springer International Publishing AG, Pages: 97-121, ISSN: 0302-9743

Advances in cloud computing have simplified the way that both software development and testing are performed. This is not true for battery testing for which state of the art test-beds simply consist of one phone attached to a power meter. These test-beds have limited resources, access, and are overall hard to maintain; for these reasons, they often sit idle with no experiment to run. In this paper, we propose to share existing battery testbeds and transform them into vantage points of BatteryLab, a power monitoring platform offering heterogeneous devices and testing conditions. We have achieved this vision with a combination of hardware and software which allow to augment existing battery test-beds with remote capabilities. BatteryLab currently counts three vantage points, one in Europe and two in the US, hosting three Android devices and one iPhone 7. We benchmark BatteryLab with respect to the accuracy of its battery readings, system performance, and platform heterogeneity. Next, we demonstrate how measurements can be run atop of BatteryLab by developing the “Web Power Monitor” (WPM), a tool which can measure website power consumption at scale. We released WPM and used it to report on the energy consumption of Alexa’s top 1,000 websites across 3 locations and 4 devices (both Android and iOS).

Conference paper

Zhao Y, Afzal SS, Akbar W, Rodriguez O, Mo F, Boyle D, Adib F, Haddadi Het al., 2022, Towards battery-free machine learning and inference in underwater environments, HotMobile '22: The 23rd International Workshop on Mobile Computing Systems and Applications, Publisher: ACM, Pages: 29-34

This paper is motivated by a simple question: Can we design and build battery-free devices capable of machine learning and inference in underwater environments? An affirmative answer to this question would have significant implications for a new generation of underwater sensing and monitoring applications for environmental monitoring, scientific exploration, and climate/weather prediction.To answer this question, we explore the feasibility of bridging advances from the past decade in two fields: battery-free networking and low-power machine learning. Our exploration demonstrates that it is indeed possible to enable battery-free inference in underwater environments. We designed a device that can harvest energy from underwater sound, power up an ultra-low-power microcontroller and on-board sensor, perform local inference on sensed measurements using a lightweight Deep Neural Network, and communicate the inference result via backscatter to a receiver. We tested our prototype in an emulated marine bioacoustics application, demonstrating the potential to recognize underwater animal sounds without batteries. Through this exploration, we highlight the challenges and opportunities for making underwater battery-free inference and machine learning ubiquitous.

Conference paper

Smith M, Snyder P, Haller M, Livshits B, Stefan D, Haddadi Het al., 2022, Blocked or broken? automatically detecting when privacy interventions break websites, Proceedings on Privacy Enhancing Technologies, Vol: 2022, Pages: 6-23, ISSN: 2299-0984

A core problem in the development and maintenance of crowd-sourced filterlists is that their maintainers cannot confidently predict whether (and where)a new filter list rule will break websites. This is a result of enormity of theWeb, which prevents filter list authors from broadly understanding the impactof a new blocking rule before they ship it to millions of users. The inabilityof filter list authors to evaluate the Web compatibility impact of a new rulebefore shipping it severely reduces the benefits of filter-list-based contentblocking: filter lists are both overly-conservative (i.e. rules are tailorednarrowly to reduce the risk of breaking things) and error-prone (i.e. blockingtools still break large numbers of sites). To scale to the size and scope ofthe Web, filter list authors need an automated system to detect when a newfilter rule breaks websites, before that breakage has a chance to make it toend users. In this work, we design and implement the first automated system forpredicting when a filter list rule breaks a website. We build a classifier,trained on a dataset generated by a combination of compatibility data from theEasyList project and novel browser instrumentation, and find it is accurate topractical levels (AUC 0.88). Our open source system requires no humaninteraction when assessing the compatibility risk of a proposed privacyintervention. We also present the 40 page behaviors that most predict breakagein observed websites.

Journal article

Zhan Y, Haddadi H, Kyllo A, Mashhadi Aet al., 2022, Privacy-Aware Human Mobility Prediction via Adversarial Networks, Pages: 7-12

As various mobile devices and location-based ser-vices are increasingly developed in different smart city scenarios and applications, many unexpected privacy leakages have arisen due to geolocated data collection and sharing. While these geolocated data could provide a rich understanding of human mobility patterns and address various societal research questions, privacy concerns for users' sensitive information have limited their utilization. In this paper, we design and implement a novel LSTM-based adversarial mechanism with representation learning to attain a privacy-preserving feature representation of the original geolocated data (i.e., mobility data) for a sharing purpose. We quantify the utility-privacy trade-off of mobility datasets in terms of trajectory reconstruction risk, user re-identification risk, and mobility predictability. Our proposed architecture reports a Pareto Frontier analysis that enables the user to assess this trade-off as a function of Lagrangian loss weight parameters. The extensive comparison results on four representative mobility datasets demonstrate the superiority of our proposed architecture and the efficiency of the proposed privacy-preserving features extractor. Our results show that by exploring Pareto optimal setting, we can simultaneously increase both privacy (45%) and utility (32%).

Abstract
Cite
Citations: 2

Conference paper

Huang Y, Zhao Y, Haddadi H, Barnaghi Pet al., 2022, Using Entropy Measures for Monitoring the Evolution of Activity Patterns, 8th IEEE World Forum on the Internet of Things (WF-IoT) - Sustainability and the Internet of Things, Publisher: IEEE

Conference paper

Zhan Y, Haddadi H, Kyllo A, Mashhadi Aet al., 2022, Privacy-Aware Human Mobility Prediction via Adversarial Networks, 2ND INTERNATIONAL WORKSHOP ON CYBER-PHYSICAL-HUMAN SYSTEM DESIGN AND IMPLEMENTATION (CPHS 2022), Pages: 7-12

Author Web Link
Cite
Citations: 1

Journal article

Siracusano G, Galea S, Sanvito D, Malekzadeh M, Antichi G, Costa P, Haddadi H, Bifulco Ret al., 2022, Re-architecting Traffic Analysis with Neural Network Interface Cards, 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Publisher: USENIX ASSOC, Pages: 513-533

Author Web Link
Cite
Citations: 7

Conference paper

Thompson O, Mandalari AM, Haddadi H, 2021, Rapid IoT device identification at the edge, CoNEXT '21: The 17th International Conference on emerging Networking EXperiments and Technologies, Publisher: ACM, Pages: 22-28

Consumer Internet of Things (IoT) devices are increasingly common in everyday homes, from smart speakers to security cameras. Along with their benefits come potential privacy and security threats. To limit these threats we must implement solutions to filter IoT traffic at the edge. To this end the identification of the IoT device is the first natural step.In this paper we demonstrate a novel method of rapid IoT device identification that uses neural networks trained on device DNS traffic that can be captured from a DNS server on the local network. The method identifies devices by fitting a model to the first seconds of DNS second-level-domain traffic following their first connection. Since security and privacy threat detection often operate at a device specific level, rapid identification allows these strategies to be implemented immediately. Through a total of 51,000 rigorous automated experiments, we classify 30 consumer IoT devices from 27 different manufacturers with 82% and 93% accuracy for product type and device manufacturers respectively.

Conference paper

Aloufi R, Haddadi H, Boyle D, 2021, EDGY: On-device paralinguistic privacy protection, Pages: 3-5

Voice user interfaces and assistants are rapidly entering our lives and becoming singular touchpoints spanning our devices. Raw audio signals collected through these devices contain a host of sensitive paralinguistic information (e.g., emotional patterns) that is transmitted to service providers regardless of deliberate or false triggers. We thus encounter a new generation of privacy risks by using these services. To tackle this issue, we have developed EDGY; a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and selectively filter sensitive attributes at the edge prior to offloading to the cloud. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ABX score and minimal performance penalties in learning linguistic representations from raw signals on a CPU and single-core ARM processor with no specialized hardware.

Abstract
Cite
Citations: 2

Conference paper

Kolcun R, Popescu DA, Safronov V, Yadav P, Mandalari AM, Mortier R, Haddadi Het al., 2021, Revisiting IoT device identification, Network Traffic Measurement and Analysis Conference 2021, Publisher: IFIP, Pages: 1-9

Internet-of-Things (IoT) devices are known to be the source of many securityproblems, and as such, they would greatly benefit from automated management.This requires robustly identifying devices so that appropriate network securitypolicies can be applied. We address this challenge by exploring how toaccurately identify IoT devices based on their network behavior, whileleveraging approaches previously proposed by other researchers. We compare the accuracy of four different previously proposed machinelearning models (tree-based and neural network-based) for identifying IoTdevices. We use packet trace data collected over a period of six months from alarge IoT test-bed. We show that, while all models achieve high accuracy whenevaluated on the same dataset as they were trained on, their accuracy degradesover time, when evaluated on data collected outside the training set. We showthat on average the models' accuracy degrades after a couple of weeks by up to40 percentage points (on average between 12 and 21 percentage points). We arguethat, in order to keep the models' accuracy at a high level, these need to becontinuously updated.

Conference paper

Malekzadeh M, Clegg R, Cavallaro A, Haddadi Het al., 2021, DANA: Dimension-Adaptive Neural Architecture for Multivariate Sensor Data, PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, Vol: 5

Author Web Link
Cite
Citations: 2

Journal article

Minto L, Haller M, Haddadi H, Livshits Bet al., 2021, Stronger privacy for federated collaborative filtering with implicit feedback, 15th ACM Conference on Recommender Systems, Publisher: ACM, Pages: 342-350

Recommender systems are commonly trained on centrally collected userinteraction data like views or clicks. This practice however raises seriousprivacy concerns regarding the recommender's collection and handling ofpotentially sensitive data. Several privacy-aware recommender systems have beenproposed in recent literature, but comparatively little attention has beengiven to systems at the intersection of implicit feedback and privacy. Toaddress this shortcoming, we propose a practical federated recommender systemfor implicit data under user-level local differential privacy (LDP). Theprivacy-utility trade-off is controlled by parameters $\epsilon$ and $k$,regulating the per-update privacy budget and the number of $\epsilon$-LDPgradient updates sent by each user respectively. To further protect the user'sprivacy, we introduce a proxy network to reduce the fingerprinting surface byanonymizing and shuffling the reports before forwarding them to therecommender. We empirically demonstrate the effectiveness of our framework onthe MovieLens dataset, achieving up to Hit Ratio with K=10 (HR@10) 0.68 on 50kusers with 5k items. Even on the full dataset, we show that it is possible toachieve reasonable utility with HR@10>0.5 without compromising user privacy.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00964123&limit=30&person=true