Faculty of EngineeringDyson School of Design Engineering

Senior Lecturer

//

//

### Location

Dyson BuildingSouth Kensington Campus

//

## Publications

Publication Type
Year
to

79 results found

Hänsel K, Wilde N, Haddadi H, Alomainy Aet al., Wearable Computing for Health and Fitness: Exploring the Relationship between Data and Human Behaviour

Health and fitness wearable technology has recently advanced, making iteasier for an individual to monitor their behaviours. Previously self generateddata interacts with the user to motivate positive behaviour change, but issuesarise when relating this to long term mention of wearable devices. Previousstudies within this area are discussed. We also consider a new approach wheredata is used to support instead of motivate, through monitoring and logging toencourage reflection. Based on issues highlighted, we then make recommendationson the direction in which future work could be most beneficial.

Journal article

Mejova Y, Abbar S, Haddadi H, Fetishizing Food in Digital Age: #foodporn Around the World

What food is so good as to be considered pornographic? Worldwide, the popular#foodporn hashtag has been used to share appetizing pictures of peoples'favorite culinary experiences. But social scientists ask whether #foodpornpromotes an unhealthy relationship with food, as pornography would contributeto an unrealistic view of sexuality. In this study, we examine nearly 10million Instagram posts by 1.7 million users worldwide. An overwhelming (anduniform across the nations) obsession with chocolate and cake shows thedomination of sugary dessert over local cuisines. Yet, we find encouragingtraits in the association of emotion and health-related topics with #foodporn,suggesting food can serve as motivation for a healthy lifestyle. Socialapproval also favors the healthy posts, with users posting with healthyhashtags having an average of 1,000 more followers than those with unhealthyones. Finally, we perform a demographic analysis which shows nation-wide trendsof behavior, such as a strong relationship (r=0.51) between the GDP per capitaand the attention to healthiness of their favorite food. Our results expose anew facet of food "pornography", revealing potential avenues for utilizing thisprecarious notion for promoting healthy lifestyles.

Journal article

Clegg RG, Haddadi H, Landa R, Rio Met al., Towards Informative Statistical Flow Inversion

A problem which has recently attracted research attention is that ofestimating the distribution of flow sizes in internet traffic. On high trafficlinks it is sometimes impossible to record every packet. Researchers haveapproached the problem of estimating flow lengths from sampled packet data intwo separate ways. Firstly, different sampling methodologies can be tried tomore accurately measure the desired system parameters. One such method is thesample-and-hold method where, if a packet is sampled, all subsequent packets inthat flow are sampled. Secondly, statistical methods can be used to invert''the sampled data and produce an estimate of flow lengths from a sample. In this paper we propose, implement and test two variants on thesample-and-hold method. In addition we show how the sample-and-hold method canbe inverted to get an estimation of the genuine distribution of flow sizes.Experiments are carried out on real network traces to compare standard packetsampling with three variants of sample-and-hold. The methods are compared fortheir ability to reconstruct the genuine distribution of flow sizes in thetraffic.

Journal article

Haddadi H, Ofli F, Mejova Y, Weber I, Srivastava Jet al., 360 Quantified Self

Wearable devices with a wide range of sensors have contributed to the rise ofthe Quantified Self movement, where individuals log everything ranging from thenumber of steps they have taken, to their heart rate, to their sleepingpatterns. Sensors do not, however, typically sense the social and ambientenvironment of the users, such as general life style attributes or informationabout their social network. This means that the users themselves, and themedical practitioners, privy to the wearable sensor data, only have a narrowview of the individual, limited mainly to certain aspects of their physicalcondition. In this paper we describe a number of use cases for how social media can beused to complement the check-up data and those from sensors to gain a moreholistic view on individuals' health, a perspective we call the 360 QuantifiedSelf. Health-related information can be obtained from sources as diverse asfood photo sharing, location check-ins, or profile pictures. Additionally,information from a person's ego network can shed light on the social dimensionof wellbeing which is widely acknowledged to be of utmost importance, eventhough they are currently rarely used for medical diagnosis. We articulate along-term vision describing the desirable list of technical advances andvariety of data to achieve an integrated system encompassing Electronic HealthRecords (EHR), data from wearable devices, alongside information derived fromsocial media data.

Journal article

Mortier R, Haddadi H, Henderson T, McAuley D, Crowcroft Jet al., Human-Data Interaction: The Human Face of the Data-Driven Society

The increasing generation and collection of personal data has created acomplex ecosystem, often collaborative but sometimes combative, aroundcompanies and individuals engaging in the use of these data. We propose thatthe interactions between these agents warrants a new topic of study: Human-DataInteraction (HDI). In this paper we discuss how HDI sits at the intersection ofvarious disciplines, including computer science, statistics, sociology,psychology and behavioural economics. We expose the challenges that HDI raises,organised into three core themes of legibility, agency and negotiability, andwe present the HDI agenda to open up a dialogue amongst interested parties inthe personal and big data ecosystems.

Journal article

Haddadi H, Howard H, Chaudhry A, Crowcroft J, Madhavapeddy A, Mortier Ret al., Personal Data: Thinking Inside the Box

We propose there is a need for a technical platform enabling people to engagewith the collection, management and consumption of personal data; and that thisplatform should itself be personal, under the direct control of the individualwhose data it holds. In what follows, we refer to this platform as the Databox,a personal, networked service that collates personal data and can be used tomake those data available. While your Databox is likely to be a virtualplatform, in that it will involve multiple devices and services, at least oneinstance of it will exist in physical form such as on a physical form-factorcomputing device with associated storage and networking, such as a home hub.

Journal article

Fay D, Haddadi H, Seto MC, Wang H, Kling CCet al., An exploration of fetish social networks and communities

Online Social Networks (OSNs) provide a venue for virtual interactions andrelationships between individuals. In some communities, OSNs also facilitatearranging online meetings and relationships. FetLife, the worlds largestanonymous social network for the BDSM, fetish and kink communities, provides aunique example of an OSN that serves as an interaction space, communityorganizing tool, and sexual market. In this paper, we present a first look atthe characteristics of European members of Fetlife, comprising 504,416individual nodes with 1,912,196 connections. We looked at user characteristicsin terms of gender, sexual orientation, and preferred role. We further examinedthe topological and structural properties of groups, as well as the type ofinteractions and relations between their members. Our results suggest there areimportant differences between the FetLife community and conventional OSNs. Thenetwork can be characterised by complex gender based interactions both from asexual market and platonic viewpoint which point to a truly fascinating socialnetwork.

Journal article

Haddadi H, Fay D, Jamakovic A, Maennel O, Moore AW, Mortier R, Rio M, Uhlig Set al., Beyond Node Degree: Evaluating AS Topology Models

Many models have been proposed to generate Internet Autonomous System (AS)topologies, most of which make structural assumptions about the AS graph. Inthis paper we compare AS topology generation models with several observed AStopologies. In contrast to most previous works, we avoid making assumptionsabout which topological properties are important to characterize the AStopology. Our analysis shows that, although matching degree-based properties,the existing AS topology generation models fail to capture the complexity ofthe local interconnection structure between ASs. Furthermore, we use BGP datafrom multiple vantage points to show that additional measurement locationssignificantly affect local structure properties, such as clustering and nodecentrality. Degree-based properties, however, are not notably affected byadditional measurements locations. These observations are particularly valid inthe core. The shortcomings of AS topology generation models stems from anunderestimation of the complexity of the connectivity in the core caused byinappropriate use of BGP data.

Journal article

Osia SA, Shamsabadi AS, Sajadmanesh S, Taheri A, Katevas K, Rabiee HR, Lane ND, Haddadi Het al., A Hybrid Deep Learning Architecture for Privacy-Preserving Mobile Analytics

Internet of Things (IoT) devices and applications are being deployed in ourhomes and workplaces and in our daily lives. These devices often rely oncontinuous data collection and machine learning models for analytics andactuations. However, this approach introduces a number of privacy andefficiency challenges, as the service operator can perform arbitrary inferenceson the available data. Recently, advances in edge processing have paved the wayfor more efficient, and private, data processing at the source for simple tasksand lighter models, though they remain a challenge for larger, and morecomplicated models. In this paper, we present a hybrid approach for breakingdown large, complex deep neural networks for cooperative, privacy-preservinganalytics. To this end, instead of performing the whole operation on the cloud,we let an IoT device to run the initial layers of the neural network, and thensend the output to the cloud to feed the remaining layers and produce the finalresult. We manipulate the model with Siamese fine-tuning and propose a noiseaddition mechanism to ensure that the output of the user's device contains noextra information except what is necessary for the main task, preventing anysecondary inference on the data. We then evaluate the privacy benefits of thisapproach based on the information exposed to the cloud service. We also assesthe local inference cost of different layers on a modern handset. Ourevaluations show that by using Siamese fine-tuning and at a small processingcost, we can greatly reduce the level of unnecessary, potentially sensitiveinformation in the personal data, and thus achieving the desired trade-offbetween utility, privacy, and performance.

Journal article

Nithyanand R, Khattak S, Javed M, Vallina-Rodriguez N, Falahrastegar M, Powles JE, Cristofaro ED, Haddadi H, Murdoch SJet al., Ad-Blocking and Counter Blocking: A Slice of the Arms Race

Journal article

Sajadmanesh S, Jafarzadeh S, Osia SA, Rabiee HR, Haddadi H, Mejova Y, Musolesi M, Cristofaro ED, Stringhini Get al., Kissing Cuisines: Exploring Worldwide Culinary Habits on the Web

Food and nutrition occupy an increasingly prevalent space on the web, anddishes and recipes shared online provide an invaluable mirror into culinarycultures and attitudes around the world. More specifically, ingredients,flavors, and nutrition information become strong signals of the tastepreferences of individuals and civilizations. However, there is littleunderstanding of these palate varieties. In this paper, we present alarge-scale study of recipes published on the web and their content, aiming tounderstand cuisines and culinary habits around the world. Using a database ofmore than 157K recipes from over 200 different cuisines, we analyzeingredients, flavors, and nutritional values which distinguish dishes fromdifferent regions, and use this knowledge to assess the predictability ofrecipes from different cuisines. We then use country health statistics tounderstand the relation between these factors and health indicators ofdifferent nations, such as obesity, diabetes, migration, and healthexpenditure. Our results confirm the strong effects of geographical andcultural similarities on recipes, health indicators, and culinary preferencesacross the globe.

Journal article

Osia SA, Shamsabadi AS, Taheri A, Katevas K, Rabiee HR, Lane ND, Haddadi Het al., Privacy-Preserving Deep Inference for Rich User Data on The Cloud

Deep neural networks are increasingly being used in a variety of machinelearning applications applied to rich user data on the cloud. However, thisapproach introduces a number of privacy and efficiency challenges, as the cloudoperator can perform secondary inferences on the available data. Recently,advances in edge processing have paved the way for more efficient, and private,data processing at the source for simple tasks and lighter models, though theyremain a challenge for larger, and more complicated models. In this paper, wepresent a hybrid approach for breaking down large, complex deep models forcooperative, privacy-preserving analytics. We do this by breaking down thepopular deep architectures and fine-tune them in a particular way. We thenevaluate the privacy benefits of this approach based on the information exposedto the cloud service. We also asses the local inference cost of differentlayers on a modern handset for mobile applications. Our evaluations show thatby using certain kind of fine-tuning and embedding techniques and at a smallprocessing costs, we can greatly reduce the level of information available tounintended tasks applied to the data feature on the cloud, and hence achievingthe desired tradeoff between privacy and performance.

Journal article

Amar Y, Haddadi H, Mortier R, Brown A, Colley J, Crabtree Aet al., An Analysis of Home IoT Network Traffic and Behaviour

Internet-connected devices are increasingly present in our homes, and privacybreaches, data thefts, and security threats are becoming commonplace. In orderto avoid these, we must first understand the behaviour of these devices. In this work, we analyse network traces from a testbed of common IoT devices,and describe general methods for fingerprinting their behavior. We then use theinformation and insights derived from this data to assess where privacy andsecurity risks manifest themselves, as well as how device behavior affectsbandwidth. We demonstrate simple measures that circumvent attempts at securingdevices and protecting privacy.

Journal article

Katevas K, Hänsel K, Clegg R, Leontiadis I, Haddadi H, Tokarchuk Let al., Finding Dory in the Crowd: Detecting Social Interactions using Multi-Modal Mobile Sensing

Remembering our day-to-day social interactions is challenging even if youaren't a blue memory challenged fish. The ability to automatically detect andremember these types of interactions is not only beneficial for individualsinterested in their behavior in crowded situations, but also of interest tothose who analyze crowd behavior. Currently, detecting social interactions isoften performed using a variety of methods including ethnographic studies,computer vision techniques and manual annotation-based data analysis. However,mobile phones offer easier means for data collection that is easy to analyzeand can preserve the user's privacy. In this work, we present a system fordetecting stationary social interactions inside crowds, leveraging multi-modalmobile sensing data such as Bluetooth Smart (BLE), accelerometer and gyroscope.To inform the development of such system, we conducted a study with 24participants, where we asked them to socialize with each other for 45 minutes.We built a machine learning system based on gradient-boosted trees thatpredicts both 1:1 and group interactions with 77.8% precision and 86.5% recall,a 30.2% performance increase compared to a proximity-based approach. Byutilizing a community detection-based method, we further detected the variousgroup formation that exist within the crowd. Using mobile phone sensors alreadycarried by the majority of people in a crowd makes our approach particularlywell suited to real-life analysis of crowd behavior and influence strategies.

Journal article

Lisi E, Malekzadeh M, Haddadi H, Lau FD-H, Flaxman Set al., Modeling and Forecasting Art Movements with CGANs

Conditional Generative Adversarial Networks (CGANs) are a recent and popularmethod for generating samples from a probability distribution conditioned onlatent information. The latent information often comes in the form of adiscrete label from a small set. We propose a novel method for training CGANswhich allows us to condition on a sequence of continuous latent distributions$f^{(1)}, \ldots, f^{(K)}$. This training allows CGANs to generate samples froma sequence of distributions. We apply our method to paintings from a sequenceof artistic movements, where each movement is considered to be its owndistribution. Exploiting the temporal aspect of the data, a vectorautoregressive (VAR) model is fitted to the means of the latent distributionsthat we learn, and used for one-step-ahead forecasting, to predict the latentdistribution of a future art movement $f^{{(K+1)}}$. Realisations from thisdistribution can be used by the CGAN to generate "future" paintings. Inexperiments, this novel methodology generates accurate predictions of theevolution of art. The training set consists of a large dataset of pastpaintings. While there is no agreement on exactly what current art period wefind ourselves in, we test on plausible candidate sets of present art, and showthat the mean distance to our predictions is small.

Working paper

Mo F, Shamsabadi AS, Katevas K, Cavallaro A, Haddadi Het al., Towards Characterizing and Limiting Information Exposure in DNN Layers

Pre-trained Deep Neural Network (DNN) models are increasingly used insmartphones and other user devices to enable prediction services, leading topotential disclosures of (sensitive) information from training data capturedinside these models. Based on the concept of generalization error, we propose aframework to measure the amount of sensitive information memorized in eachlayer of a DNN. Our results show that, when considered individually, the lastlayers encode a larger amount of information from the training data compared tothe first layers. We find that, while the neuron of convolutional layers canexpose more (sensitive) information than that of fully connected layers, thesame DNN architecture trained with different datasets has similar exposure perlayer. We evaluate an architecture to protect the most sensitive layers withinthe memory limits of Trusted Execution Environment (TEE) against potentialwhite-box membership inference attacks without the significant computationaloverhead.

Working paper

Aloufi R, Haddadi H, Boyle D, Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants

Voice-enabled interactions provide more human-like experiences in manypopular IoT systems. Cloud-based speech analysis services extract usefulinformation from voice input using speech recognition techniques. The voicesignal is a rich resource that discloses several possible states of a speaker,such as emotional state, confidence and stress levels, physical condition, age,gender, and personal traits. Service providers can build a very accurateprofile of a user's demographic category, personal preferences, and maycompromise privacy. To address this problem, a privacy-preserving intermediatelayer between users and cloud services is proposed to sanitize the voice input.It aims to maintain utility while preserving user privacy. It achieves this bycollecting real time speech data and analyzes the signal to ensure privacyprotection prior to sharing of this data with services providers. Precisely,the sensitive representations are extracted from the raw signal by usingtransformation functions and then wrapped it via voice conversion technology.Experimental evaluation based on emotion recognition to assess the efficacy ofthe proposed method shows that identification of sensitive emotional state ofthe speaker is reduced by ~96 %.

Working paper

Voice controlled devices and services have become very popular in theconsumer IoT. Cloud-based speech analysis services extract information fromvoice inputs using speech recognition techniques. Services providers can thusbuild very accurate profiles of users' demographic categories, personalpreferences, emotional states, etc., and may therefore significantly compromisetheir privacy. To address this problem, we have developed a privacy-preservingintermediate layer between users and cloud services to sanitize voice inputdirectly at edge devices. We use CycleGAN-based speech conversion to removesensitive information from raw voice input signals before regeneratingneutralized signals for forwarding. We implement and evaluate our emotionfiltering approach using a relatively cheap Raspberry Pi 4, and show thatperformance accuracy is not compromised at the edge. In fact, signals generatedat the edge differ only slightly (~0.16%) from cloud-based approaches forspeech recognition. Experimental evaluation of generated signals show thatidentification of the emotional state of a speaker can be reduced by ~91%.

Working paper

Contextual bandit algorithms (CBAs) often rely on personal data to providerecommendations. This means that potentially sensitive data from pastinteractions are utilized to provide personalization to end-users. Using alocal agent on the user's device protects the user's privacy, by keeping thedata locally, however, the agent requires longer to produce usefulrecommendations, as it does not leverage feedback from other users. This paperproposes a technique we call Privacy-Preserving Bandits (P2B), a system thatupdates local agents by collecting feedback from other agents in adifferentially-private manner. Comparisons of our proposed approach with anon-private, as well as a fully-private (local) system, show competitiveperformance on both synthetic benchmarks and real-world data. Specifically, weobserved a decrease of 2.6% and 3.6% in multi-label classification accuracy,and a CTR increase of 0.0025 in online advertising for a privacy budget$\epsilon \approx$ 0.693. These results suggest P2B is an effective approach toproblems arising in on-device privacy-preserving personalization.