Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Citation

BibTex format

@inproceedings{Gadotti:2019,
author = {Gadotti, A and Houssiau, F and Rocher, L and Livshits, B and de, Montjoye Y-A},
publisher = {USENIX},
title = {When the Signal is in the Noise: Exploiting Diffix's Sticky Noise},
url = {http://hdl.handle.net/10044/1/69958},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Anonymized data is highly valuable to both businesses andresearchers. A large body of research has however shown thestrong limits of the de-identification release-and-forget model,where data is anonymized and shared. This has led to the de-velopment of privacy-preserving query-based systems. Basedon the idea of “sticky noise”, Diffix has been recently pro-posed as a novel query-based mechanism satisfying alone theEU Article 29 Working Party’s definition of anonymization.According to its authors, Diffix adds less noise to answersthan solutions based on differential privacy while allowingfor an unlimited number of queries.This paper presents a new class of noise-exploitation at-tacks, exploiting the noise added by the system to infer privateinformation about individuals in the dataset. Our first differen-tial attack uses samples extracted from Diffix in a likelihoodratio test to discriminate between two probability distributions.We show that using this attack against a synthetic best-casedataset allows us to infer private information with 89.4% ac-curacy using only 5 attributes. Our second cloning attack usesdummy conditions that conditionally strongly affect the out-put of the query depending on the value of the private attribute.Using this attack on four real-world datasets, we show thatwe can infer private attributes of at least 93% of the users inthe dataset with accuracy between 93.3% and 97.1%, issuinga median of 304 queries per user. We show how to optimizethis attack, targeting 55.4% of the users and achieving 91.7%accuracy, using a maximum of only 32 queries per user.Our attacks demonstrate that adding data-dependent noise,as done by Diffix, is not sufficient to prevent inference ofprivate attributes. We furthermore argue that Diffix alone failsto satisfy Art. 29 WP’s definition of anonymization. We con-clude by discussing how non-provable privacy-preserving systems can be combined with fundamental security principlessuch as defense-in
AU - Gadotti,A
AU - Houssiau,F
AU - Rocher,L
AU - Livshits,B
AU - de,Montjoye Y-A
PB - USENIX
PY - 2019///
TI - When the Signal is in the Noise: Exploiting Diffix's Sticky Noise
UR - http://hdl.handle.net/10044/1/69958
ER -