Publications
8 results found
Thompson O, Mandalari AM, Haddadi H, 2021, Rapid IoT device identification at the edge, CoNEXT '21: The 17th International Conference on emerging Networking EXperiments and Technologies, Publisher: ACM, Pages: 22-28
Consumer Internet of Things (IoT) devices are increasingly common in everyday homes, from smart speakers to security cameras. Along with their benefits come potential privacy and security threats. To limit these threats we must implement solutions to filter IoT traffic at the edge. To this end the identification of the IoT device is the first natural step.In this paper we demonstrate a novel method of rapid IoT device identification that uses neural networks trained on device DNS traffic that can be captured from a DNS server on the local network. The method identifies devices by fitting a model to the first seconds of DNS second-level-domain traffic following their first connection. Since security and privacy threat detection often operate at a device specific level, rapid identification allows these strategies to be implemented immediately. Through a total of 51,000 rigorous automated experiments, we classify 30 consumer IoT devices from 27 different manufacturers with 82% and 93% accuracy for product type and device manufacturers respectively.
Kolcun R, Popescu DA, Safronov V, et al., 2021, Revisiting IoT device identification, Network Traffic Measurement and Analysis Conference 2021, Publisher: IFIP, Pages: 1-9
Internet-of-Things (IoT) devices are known to be the source of many securityproblems, and as such, they would greatly benefit from automated management.This requires robustly identifying devices so that appropriate network securitypolicies can be applied. We address this challenge by exploring how toaccurately identify IoT devices based on their network behavior, whileleveraging approaches previously proposed by other researchers. We compare the accuracy of four different previously proposed machinelearning models (tree-based and neural network-based) for identifying IoTdevices. We use packet trace data collected over a period of six months from alarge IoT test-bed. We show that, while all models achieve high accuracy whenevaluated on the same dataset as they were trained on, their accuracy degradesover time, when evaluated on data collected outside the training set. We showthat on average the models' accuracy degrades after a couple of weeks by up to40 percentage points (on average between 12 and 21 percentage points). We arguethat, in order to keep the models' accuracy at a high level, these need to becontinuously updated.
Mandalari AM, Dubois DJ, Kolcun R, et al., 2021, Blocking without breaking: identification and mitigation ofnon-essential IoT traffic, Publisher: arXiv
Despite the prevalence of Internet of Things (IoT) devices, there is littleinformation about the purpose and risks of the Internet traffic these devicesgenerate, and consumers have limited options for controlling those risks. A keyopen question is whether one can mitigate these risks by automatically blockingsome of the Internet connections from IoT devices, without rendering thedevices inoperable. In this paper, we address this question by developing arigorous methodology that relies on automated IoT-device experimentation toreveal which network connections (and the information they expose) areessential, and which are not. We further develop strategies to automaticallyclassify network traffic destinations as either required (i.e., their trafficis essential for devices to work properly) or not, hence allowing firewallrules to block traffic sent to non-required destinations without breaking thefunctionality of the device. We find that indeed 16 among the 31 devices wetested have at least one blockable non-required destination, with the maximumnumber of blockable destinations for a device being 11. We further analyze thedestination of network traffic and find that all third parties observed in ourexperiments are blockable, while first and support parties are neitheruniformly required or non-required. Finally, we demonstrate the limitations ofexisting blocklists on IoT traffic, propose a set of guidelines forautomatically limiting non-essential IoT traffic, and we develop a prototypesystem that implements these guidelines.
Kolcun R, Popescu DA, Safronov V, et al., 2020, The case for retraining of ML models for IoT device identification at the edge, Publisher: arXiv
Internet-of-Things (IoT) devices are known to be the source of many securityproblems, and as such they would greatly benefit from automated management.This requires robustly identifying devices so that appropriate network securitypolicies can be applied. We address this challenge by exploring how toaccurately identify IoT devices based on their network behavior, usingresources available at the edge of the network. In this paper, we compare the accuracy of five different machine learningmodels (tree-based and neural network-based) for identifying IoT devices byusing packet trace data from a large IoT test-bed, showing that all models needto be updated over time to avoid significant degradation in accuracy. In orderto effectively update the models, we find that it is necessary to use datagathered from the deployment environment, e.g., the household. We thereforeevaluate our approach using hardware resources and data sources representativeof those that would be available at the edge of the network, such as in an IoTdeployment. We show that updating neural network-based models at the edge isfeasible, as they require low computational and memory resources and theirstructure is amenable to being updated. Our results show that it is possible toachieve device identification and categorization with over 80% and 90% accuracyrespectively at the edge.
Saidi SJ, Mandalari AM, Kolcun R, et al., 2020, A haystack full of needles: scalable detection of IoT devices in the wild, Publisher: arXiv
Consumer Internet of Things (IoT) devices are extremely popular, providingusers with rich and diverse functionalities, from voice assistants to homeappliances. These functionalities often come with significant privacy andsecurity risks, with notable recent large scale coordinated global attacksdisrupting large service providers. Thus, an important first step to addressthese risks is to know what IoT devices are where in a network. While somelimited solutions exist, a key question is whether device discovery can be doneby Internet service providers that only see sampled flow statistics. Inparticular, it is challenging for an ISP to efficiently and effectively trackand trace activity from IoT devices deployed by its millions of subscribers--all with sampled network data. In this paper, we develop and evaluate a scalable methodology to accuratelydetect and monitor IoT devices at subscriber lines with limited, highly sampleddata in-the-wild. Our findings indicate that millions of IoT devices aredetectable and identifiable within hours, both at a major ISP as well as anIXP, using passive, sparsely sampled network flow headers. Our methodology isable to detect devices from more than 77% of the studied IoT manufacturers,including popular devices such as smart speakers. While our methodology iseffective for providing network analytics, it also highlights significantprivacy consequences.
Ren J, Dubois DJ, Choffnes D, et al., 2019, Information exposure from consumer IoT devices: a multidimensional, network-informed measurement approach, ACM Internet Measurement Conference (IMC), Publisher: ASSOC COMPUTING MACHINERY, Pages: 267-279
Internet of Things (IoT) devices are increasingly found in everyday homes, providing useful functionality for devices such as TVs, smart speakers, and video doorbells. Along with their benefits come potential privacy risks, since these devices can communicate information about their users to other parties over the Internet. However, understanding these risks in depth and at scale is difficult due to heterogeneity in devices' user interfaces, protocols, and functionality.In this work, we conduct a multidimensional analysis of information exposure from 81 devices located in labs in the US and UK. Through a total of 34,586 rigorous automated and manual controlled experiments, we characterize information exposure in terms of destinations of Internet traffic, whether the contents of communication are protected by encryption, what are the IoT-device interactions that can be inferred from such content, and whether there are unexpected exposures of private and/or sensitive information (e.g., video surreptitiously transmitted by a recording device). We highlight regional differences between these results, potentially due to different privacy regulations in the US and UK. Last, we compare our controlled experiments with data gathered from an in situ user study comprising 36 participants.
Mandalari AM, Lutu A, Dhamdhere A, et al., Tracking the Big NAT across Europe and the U.S
Carrier Grade NAT (CGN) mechanisms enable ISPs to share a single IPv4 addressacross multiple customers, thus offering an immediate solution to the IPv4address scarcity problem. In this paper, we perform a large scale activemeasurement campaign to detect CGNs in fixed broadband networks using NATRevelio, a tool we have developed and validated. Revelio enables us to activelydetermine from within residential networks the type of upstream network addresstranslation, namely NAT at the home gateway (customer-grade NAT) or NAT in theISP (Carrier Grade NAT). We demonstrate the generality of the methodology bydeploying Revelio in the FCC Measuring Broadband America testbed operated bySamKnows and also in the RIPE Atlas testbed. We enhance Revelio to activelydiscover from within any home network the type of upstream NAT configuration(i.e., simple home NAT or Carrier Grade NAT). We ran an active large-scalemeasurement study of CGN usage from 5,121 measurement vantage points withinover 60 different ISPs operating in Europe and the United States. We found that10% of the ISPs we tested have some form of CGN deployment. We validate ourresults with four ISPs at the IP level and, reported to the ground truth wecollected, we conclude that Revelio was 100% accurate in determining theupstream NAT configuration for all the corresponding lines. To the best of ourknowledge, this represents the largest active measurement study of (confirmed)CGN deployments at the IP level in fixed broadband networks to date.
Mandalari AM, Kolcun R, Haddadi H, et al., Towards Automatic Identification and Blocking of Non-Critical IoT Traffic Destinations
The consumer Internet of Things (IoT) space has experienced a significantrise in popularity in the recent years. From smart speakers, to baby monitors,and smart kettles and TVs, these devices are increasingly found in householdsaround the world while users may be unaware of the risks associated with owningthese devices. Previous work showed that these devices can threatenindividuals' privacy and security by exposing information online to a largenumber of service providers and third party analytics services. Our analysisshows that many of these Internet connections (and the information they expose)are neither critical, nor even essential to the operation of these devices.However, automatically separating out critical from non-critical networktraffic for an IoT device is nontrivial, and requires expert analysis based onmanual experimentation in a controlled setting. In this paper, we investigatewhether it is possible to automatically classify network traffic destinationsas either critical (essential for devices to function properly) or not, henceallowing the home gateway to act as a selective firewall to block undesired,non-critical destinations. Our initial results demonstrate that some IoTdevices contact destinations that are not critical to their operation, andthere is no impact on device functionality if these destinations are blocked.We take the first steps towards designing and evaluating IoTrimmer, a frameworkfor automated testing and analysis of various destinations contacted bydevices, and selectively blocking the ones that do not impact devicefunctionality.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.