Skip links

New ICO report: It Is Not About Big Data Versus Data Protection

Report. The UK data protection office published a new well-written report on big data, artificial intelligence, machine learning and data protection. Especially 5 areas can have possible implications for data protection. 

“It is not a case of big data ‘or’ data protection, or big data ‘versus’ data protection. That would be the wrong conversation. Privacy is not an end in itself, it is an enabling right. Embedding privacy and data protection into big data analytics enables not only societal benefits such as dignity, personality and community, but also organisational benefits like creativity, innovation and trust. In short, it enables big data to do all the good things it can do. Yet that’s not to say someone shouldn’t be there to hold big data to account.”

These are the words from the ICO’s Information Commissioner, Elizabeth Denham, in the March 2017- report on big data, artificial intelligence, machine learning and data protection. The report is an update and substitution of a 2014-paper on big data.

ICO defines AI as: “the analysis of data to model some aspect of the world. Inferences from these models are then used to predict and anticipate possible future events…….giving computers behaviours which would be thought intelligent in human beings.”
And machine learning as “the set of techniques and tools that allow computers to ‘think’ by creating mathematical algorithms based on accumulated data.”

ICO then points to 5 areas which need special focus regarding possible implications for data protection;

  • Use of algorithms (What is new is that we now are ‘thinking with data’ – a form of machine learning, where the system ‘learns’ which are the relevant criteria from analysing the data.)
  • Opacity of the processing (It often seems opaque because humans lack the capacity to comprehend the decision-making)
  • Tendency to collect ‘all the data’ (Ex: In a retail context it could mean analysing all the purchases made by shoppers using a loyalty card, and using this to find correlations, rather than asking a sample of shoppers to take part in a survey)
  • Repurposing of data (With analytics analytics we can mine data for new insights and find correlations between apparently disparate datasets. Geotagged photos on Flickr, together with the profiles of contributors, have been used as a reliable proxy for estimating visitor numbers at tourist sites and where the visitors have come from)
  • Use of new types of data (Traditionally people have consciously provided their personal data, but this is no longer the only or main way in which personal data is collected. In many cases the data being used for the analytics has been generated automatically, for example by tracking online activity or from Internet of Things devices).

Using personal data in analytics means that you have to consider

Fairness – does it have intrusive effects on your customers? Is it within their expectations? Are you transparent about it?

Conditions for processing data: Do you have meaningful consent? Do you have legitimate interests where you have balanced your own interests against those of the individuals concerned.
Is it strictly necessary for the performance of a contract.

When it comes to consent you can for example practise a graduated consent or suggest at time limit so the data are not used when the time has expired.

When it comes to social media: Just because people have put data onto social media without restricting access does not necessarily legitimise all further use of it.

Purpose Limitation: Did you make an assessment of compatibility of processing purposes?

Data Minimisation: You need to not collect data that is excessive for the processing purpose or bebe encouraged to retain personal data for longer than necessary even though it is easy to do so.

Accuracy: There are implications regarding the accuracy of personal data at all stages of a big data project, and results of data analysis may not be representative of the population as a whole, and there can be hidden biases in datasets.

The report also deals with the new rights of individuals, security measures, anonymity and encryption and the risks of re-identification, how to make privacy notices, work with privacy impact assessments and privacy by design and create ethics boards.

Finally, ICO concludes what is exactly up the alley of DataEthics’ work:

“We welcome the trend towards organisations developing their own ethical principles and building relationships of trust with the public, because putting this into practice will assist compliance with data protection requirements. Recent moves towards setting up ‘councils of ethics’, within organisations and nationally, are a positive development that should also support this.”

The final recommendations are:

  1. Use appropriate techniques to anonymise the personal data
  2. Be transparent about processing of personal data
  3. Embed a privacy impact assessment framework into the big data processing activities
  4. Adopt a privacy by design approach
  5. Develop ethical principles
  6. Develop auditable machine learning algorithms

Read the whole discussion paper