When the Big Data Sums Do Not Add Up

None of this might matter if we could be sure – really sure – that such data gathering was always in our individual interests. But of course we cannot be sure.

Already, large corporations, governments and intelligence agencies are using algorithms to ‘interrogate’ big data for usable patterns, trends and information. Companies and governments look for large groups of people to target for a particular service they want to offer, while intelligence agencies sift the same information to find smaller groups of people on which to ‘target’ their law and order focus.

Curiously, the intelligence agencies and financial institutions often work in similar ways to profile their ‘targets’. Dr Björn Rupp, advisor to the German Government, says that so-called ‘data robots’ work their way through government-intercepted data and a mix of other databases, some obtained from the commercial sector, to seek patterns.

The algorithms are fed the financial and communications records of a known terrorist or criminal organisation and then used to analyse the huge data pool that has been collected. ‘The current data hides a lot of information inside so you can easily determine not just who called who, but who was travelling from where and when and how many people that they were in contact with,’ says Rupp.

Adding further information from the internet makes the potential profiling ability of the systems even more powerful. ‘From the data you can employ advanced data mining technology and then find out, for instance, not only who the person is but you can then profile that person to find similar people to them in the database. You can effectively say to [the algorithm] I’m trying to find a certain person and the system will generate that person for me even though I don’t know them yet,’ says Rupp.

Such group profiling is made easier for the intelligence agencies because it is also information that financial institutions are looking for. Algorithms used by the financial services sector now seek to discover groups of friends and associations between particular groups in their records. These can then be overlaid with data from social media groups to fine-tune marketing activities. In the case of the banks the algorithms will seek out information that shows that groups of debit or credit cards are used at the same time and place, indicating that the potential marketing targets are taking part in a group activity, for example swimming or football or watching sports events.

According to Rupp data robots working for the intelligence agencies will also focus on such data clusters; though in their case they will try to determine whether, for example, participation in a five-a-side football team is not simply cover for a more sinister purpose.

A specific concern will be the way that insurance companies use big data, perhaps to minimise their exposure to risk – or to put it another way, to prevent them having to pay out on claims by effectively disenfranchising entire groups.

An example concerns the insurance industry in Britain and its attitude towards people who live in houses built on flood plains. Due to their concerns about fears of human-induced climate change and the high incidence of flooding and damage to houses built on flood plains, the insurance industry is threatening to withdraw cover from such homes. This means that anyone wanting to buy such a home would be unable to get a mortgage. Those already owning them will not be able to sell them.

Such group profiling is made easier for the intelligence agencies because it is also information that financial institutions are looking for.

The insurance industry, using statistics culled from the emerging internet of things – patterns of climate change, previous flooding incidents, likely flooding models and sensors now placed in all of the major rivers that record river flow and flood water patterns, and which are showing year on year increases – is demanding that the UK Government covers it for providing cover.

Thus, without government intervention those people living in flood plains will become an unprotected group. The insurance industry has sought to strengthen its argument by pointing out that projected patterns suggest that areas which hitherto would not have been considered high risk areas for flooding could one day be under threat.

The same scenario could be developed for people suffering from obesity or other similar conditions. Big data, as Cheok says, can produce a wonderful world but not when you are on the receiving end of it.

‘We have to consider that there are already cases of ID theft and people running up a bill for you without your knowing. But when you are extending more and more of your intellect onto the internet what we are going to see is that there is going to be a battle between the people in society who genuinely want to make good use of this data, as in the case of healthcare, and those who don’t.‘

‘It is genuinely useful if you can be connected to your doctor for 24 hours of the day and they can see all of the information about your body, but on the other hand health insurance companies, if they have that data, can eliminate from their policies anyone who the remotest chance of getting sick in the next ten years or so. So there is a bad side to having so much data on the internet,’ says Cheok.

Minority Report myth

There is growing concern among experts about the current approach to analysing the ‘big data’ that is being gathered. The root of this concern, according to Professor Mayer-Schönberger, stems from a misunderstanding of big data. Mayer-Schönberger says it is being used wrongly to predict people’s patterns of behaviour instead of simply being seen as a record of what has happened.

‘The problem is that as human beings we want to see the world as a series of causes and effects and therefore we are tempted to abuse big data analysis – which can only tell us what is going on – to know why this is going on, so that we can then connect guilt and individual responsibility to individuals. This is precisely some of the abuse that we see coming out of the NSA and Prism debates,’ says Mayer-Schönberger.

Indeed, many politicians have fallen prey to the temptation to see big data as a universal panacea, according to Mayer-Schönberger.

‘As we have seen not only is [big data analysis] being used in the US to prevent terrorist attacks, but it is also being used to go after petty crime by the FBI and local police forces. Then you have a very powerful tool that cannot tell you anything about individual responsibility – it only tells you “what”, not “why” – and it is being used for the purpose of assigning individual responsibility and causality,’ says Mayer-Schönberger. While at present there is continuing uncertainty about exactly how one derives information from data, the point is that given the speed with which the new world of data and its attendant IoT world is developing, it, makes sense to err on the side of caution before using data in this way.

The academic says there is a risk is of falling for the myth depicted in the 2002 Steven Spielberg film Minority Report, in which the police apprehend criminals before they commit a crime. ‘Minority Report has a very strong rosy premise and that is to avoid having victims,’ says Mayer-Schönberger. ‘The problem with it is that we don’t let fate play out and we don’t know whether a person would have committed a crime; we make an assumption that they will because every prediction based on big data is probabilistic.

‘With big data, there is a risk of predictive social control and a system of social control which slaughters human volition at the altar of collective fear,’ he argues.

Another issue surrounding ‘big data’ is the extent to which it is ‘anonymised’, in other words removing the information that would identify an individual. Though the data processing industry claims that by anonymising they are not using an individual’s data against their wishes, as already noted this is challenged by a wide range of experts including Mayer-Schönberger. The critics suggest one can ‘reverse out’ individuals from the data and identify them. There are some who claim that ‘anonymisation’ is one of the data processing industry’s greatest lies.

This article is part of a series of articles published from the Netopia report Can We Make the Digital World Ethical? Exploring the Dark Side of the Internet of Things and Big Data, by Peter Warren, Michael Streeter and Jane Whyatt.