“We can’t not use data. But we need to find new ways to protect people’s privacy”

Dr Yves-Alexandre De Montjoye, Department of Computing

The problem

Dr Yves-Alexandre De MontjoyeTwenty years ago, it was relatively easy to protect your data, but not anymore. Recently, for example, Australian researchers studying a database of 2.9 million ‘anonymised’ medical records (with names and addresses removed, for instance, among other things) were easily able to re-identify patients simply by comparing the dataset to other information in the public domain, such as athletes having operations or celebrity mums giving birth. In another study, fitness app Polar was used to identify names and addresses of soldiers and spies.

And at Imperial, Professor Yves-Alexandre de Montjoye recently found that 95 per cent of us can be uniquely identified from apparently anonymous datasets, using just times and places our phones show we have visited.

“Data anonymisation just doesn’t work anymore,” says de Montjoye, Head of the Computational Privacy Group at Imperial’s Data Science Institute. “There is no longer a guarantee of not being identified. We think of privacy in terms of the specific decisions we make in relation to privileges, but that’s no longer the case – we are identified by our movements, the places we visit, the things we buy or even the things our friends do, that link to us.”

The role of artificial intelligence

Artificial intelligence (AI) is changing every aspect of the way we live. AI works by machine learning through data – so if anonymisation is no longer practically possible, how can AI algorithms continue to learn from large-scale, sensitive datasets, while preserving our privacy?

“AI has tremendous potential for good,” says de Montjoye. “But we need to find a way to protect data.”  An initiative led by the Data Science Institute at Imperial is doing just that. The Open Algorithms (OPAL) project is a secure platform that allows researchers to use datasets without actually getting a copy of them.

The aim of the project, which is currently being tested with phone firms Orange in Senegal and Telefónica in Colombia, is that instead of releasing large datasets, companies give governments access to interfaces that allow them to ‘ask questions of the  data but without being able to access the raw files’.