Why big data can be dangerous

Written by

Gilles Chemla

Published

3 May 2018

Key topics

Artificial Intelligence, Big Data, Cyber Security

Data is the revolutionary tool of our age, but it is not the objective measure many think: it is inextricably tied up in bias, prejudice and subjectivity

We live in a data-fixated age – the Age of Big Data, as some have called it – one in which the consensus seems to be that data can solve all of our problems. It is true we have access to an unprecedented scale and depth of information; this is in fact one of the key challenges we face.

But more on that later. Let’s start by considering the very nature of the data at our disposal. We collect it through scientific means, using experiments that yield objective findings. The assumption is that, if you design the experiment correctly, ask the right questions, or measure the right behaviour, you will gain insights into human behaviour: how they react to a product or even a medicine, how they behave in certain situations, or what they value or believe.

This, however, is intellectually lazy because bias can be extremely difficult to circumvent in the process of collecting, measuring or reading the data from experiments. You’ll be familiar with the placebo effect in medicine: in business terms, we might consider the Hawthorne effect, which highlights the difficulty of conducting large-scale workplace experiments due to the changed behaviours of those aware an experiment is taking place.

But this effect was named after a series of experiments conducted in the 1930s – surely, you might say, with the scope of big data, we needn’t worry about such clouding of the dataset. Bias, however, applies even with an infinite quantity of data. Data points are not inert, or exogenous – they are not chosen by nature, but always by the analyst.

Tech companies are run by intelligent people – would it be a huge surprise if we saw some clever circumvention of the new regulations?

People then assume the problem is not with the experiment itself, but with the sample. If only they could get the perfect sample, then the experiment would yield results. It is not so simple, though – the bias can equally well exist in the methodology; methodologies that do not take into consideration that people have expectations and emotions.

To give an example of how results can be affected, take this example from Princeton. When Americans were asked if they felt the government spent too little on “welfare”, only 25 per cent said they did. Swap welfare for “assistance to the poor”, however, and the figure shot up to 65 per cent.

This is not to say is impossible to gain truly objective insights, only that understanding why people act as they do requires rigorous analysis of the entire pattern of behaviour.

Humans have intuition: robots don’t

For managers and decision makers in organisations of any size, it is deeply problematic to set one’s stall on decisions made purely on the basis of so-called objective data. What’s the solution, then? Well, actually it’s quite simple: alongside data where appropriate, or sometimes even in lieu of it, the best bet can be to place faith in the intuition gained from your experience (or from someone who has it if you do not).

Think of the scope of data held by the likes of Google or Facebook, which now includes your actual face

Steve Jobs, problematic a figure as he might have been, was totally averse to employing market research in product innovation or development. This might be read as an awareness of the fallibility of data, which could only be coloured by bias and limited by the imaginative scope of the potential customer base. The latter could scarcely be expected to know they wanted something they were as yet unaware existed.

The same challenge applies also to AI and machine learning. The latter is limited to past data, so what can be done by AI without humans is, by definition, limited. It is only from humans that new ways of thinking, which can give rise to truly original ideas, can arise.

Artificial intelligence might then be viewed as a potential impediment to true innovation. It also has huge ramifications for employment at all levels – a problem that is often acknowledged, though scarcely, as yet, acted upon. Which leads to the second half of our case…

Data as the abuse of power

Leaving aside for one moment the difficulty of gaining access to completely objective and reliable data, let’s consider another related though somewhat different phenomenon. It is a key danger associated with the forensic levels of data gathering taking place in the modern world: privacy.

The potential customer base could scarcely be expected to know they wanted something they were as yet unaware existed

Perhaps you’ll already be familiar with the story of American retail store Target’s accidental revelation to a teenage girl’s father that his daughter was pregnant. This story broke in 2012 – the relative dark ages in terms of data science.

Today, we see companies promising to trace your genealogy or assess your susceptibility to certain medical conditions using nothing more than a tiny sample of your DNA. Amazing, no doubt, but who else might be interested in such data? What about insurance companies, for whom such deep genetic information would be of no small interest?

That’s something into which one must quite proactively opt, but what about the gradual spread of the Internet of Things? Devices wired to respond to your needs in real time – and to build up a repository of data on your behaviour to feedback to the manufacturers. Studies show said manufacturers are somewhat coy about what exactly is being done with this information.

Without wanting to get carried away, when you have devices that can monitor your vital signs, collecting data that can give indications of your mental health, it’s important to ensure that information doesn’t fall into unscrupulous hands. Think of the scope of data held by the likes of Google or Facebook (we hardly need reminding), which now includes your actual face.

Data points are not inert, or exogenous – they are not chosen by nature, but always by the analyst

And let’s not discount the huge amount of data lost in breaches. Perhaps GDPR will ameliorate the situation, but tech companies are run by intelligent people – would it be a huge surprise if we saw some clever circumvention of the new regulations? It’s not just business either: certain governments and public bodies also hold what might be considered a worrying amount of information. China’s “scoring” of citizens or the unregulated, error-prone “virtual line-up” held by various US police forces (in which 50 per cent of adults unwittingly feature) are two such examples.

It would be wrong to say we should shy away from data; when collected and analysed scientifically it can yield insights and connections that would have previously been impossible to glean. It’s also wrong to get carried away and fail to take a considered approach to the collection and usage of data: there are ramifications for the economy, for privacy, and even potentially for democracy. We also need to think carefully about the way we collect, measure, and read data – because as much as we assume data is immutable, where it comes to measuring human behaviour in particular, it is intrinsically subject to bias, prejudice and subjectivity.

Written by

Gilles Chemla

Published

3 May 2018

Key topics

Artificial Intelligence, Big Data, Cyber Security

Gilles is a Professor of Finance at Imperial College Business School, a research fellow at Centre National de la Recherche Scientifique, a research fellow at Centre for Economic Policy Research (CEPR), a programme director at CEPREMAP (a French equivalent to CEPR), and a member of the American Finance Association, American Economic Association, Western Finance Association, and European Finance Association. He has also worked in corporate finance at BNP Paribas, as an independent consultant for a variety of corporate, financial, and governmental institutions and professional and international organisations, and as an Assistant Professor of Finance at the Sauder School of Business, University of British Columbia. Gilles holds a PhD in economics from the London School of Economics, an MSc in economics from the Paris School of Economics, a degree in mathematics from the University Paris-Diderot, and is a graduate engineer from the Ecole Nationale des Ponts et Chaussées

You can find the author's full profile, including publications, at their Imperial Profile

Monthly newsletter

Receive the latest insights from Imperial College Business School

5 things executives need to know about AI and machine learning

Financial services firms expect significant growth in their use of machine learning, but are already having problems implementing it at scale. Professor Marcin Kacperczyk untangles the knot for business leaders

Why big data can be dangerous

Written by

Published

Category

Key topics

Humans have intuition: robots don’t

Data as the abuse of power

Written by

Published

Category

Key topics

About Gilles Chemla

Monthly newsletter

5 things executives need to know about AI and machine learning

Find us

Imperial College Business School

Accreditations

Why big data can be dangerous

Written by

Published

Category

Key topics

Humans have intuition: robots don’t

Data as the abuse of power

Written by

Published

Category

Key topics

Share

About Gilles Chemla

Monthly newsletter

5 things executives need to know about AI and machine learning