Skip links

How To Adress Data Ethics at the Design Stage of AI

Analysis. We ask artificially intelligent systems to make order in our messy modern realities, but very rarely do we question what type of order. This is a call for an historical awareness of the social systems that we are building: Which social, cultural norms, values and interests do they represent, reinforce and enact?

A recent study showed how machine learning methods (so called word embeddings), trained on e.g. Google News articles, amplify gender stereo types. With this method used broadly in online search engines, hugely stereotyped female occupations are placed in one cluster (such as receptionist, house keeper, nanny) and male occupations (such as boss, philosopher or financier) in another. Imagine a recruiter using online search to look for possible candidates for a position? The researchers behind this study also propose a solution on the design stage of the systems; a method to debias word embeddings.

Data processing algorithms are the language of the Big Data Age. Just like any other language, this language is not transcendent of the context in which it is used, but an expression of a given human community and the priority of things. This also means that the AI engineer, more than being just a scientific engineer, is a social engineer.


The engineer can adress data ethics issues at the design stage of AI by conducting an ethics review and social impact assessment:

1. INTERESTS: Who or what does the AI benefit? What is the explicit rationale? (e.g. who supplies the data? and who benefits from it?)

2. TRAINING DATA: Which cultural values/biases does the training data represent? Can we live with them? Are they transparent? Can the training data be manipulated?

3. TRANSPARENCY: Can data processes be traced and explained?

4. ACCOUNTABILITY: Can the system rectify? Be audited by an external auditor?

5. HUMAN EMPOWERMENT/AGENCY: How can the needs and values of the communities and stakeholders affected by the AI be met in the design?

Data Processing Systems Raise Data Ethics Issues

We might be fooled by the name, but artificially intelligent systems are not master mind agents with their own free will (yet), nor are they in any sense objective tools. Machine learning/AI are data processing systems that represent interests, power relations and that influence society, create or limit opportunities. This also means that they are systems that can be influenced, designed and as Frank Pasquale puts it in this article, they can be regulated. But first we need to understand the data ethical implications of AI in general:

Firstly, an intelligent personal technology becomes intelligent when it has a detailed data profile on a person that it remembers and act on. When creating these data profiles, are people being treated fairly? Are decisions accurate and free from bias? Is there a legal basis for the profiling? (Look at the new EU data protection regulation (GDPR) that has much stricker provisions on data profiling than the previous directive). We also need to ask what type of decisions we are asking the system to make for us. What are the potential future consequences of e.g. combining data profiling and algorithmic decision making of the internet of toys with AI job matching and recruiting? Do we truly understand the implications of the combination and correlation of the powerful profiling and decisionmaking features of AI?

Secondly, AI innovation is moving in legal grey zones. For example, a research team at the Alan Turing Institute in London argues that the GDPR does not have a right to explanation, only a right to notification: “There is an idea that the GDPR will deliver accountability and transparency for AI, but that’s not at all guaranteed. It all depends on how it is interpreted in the future by national and European courts,” says one of the researchers to the Guardian. How do we actually ensure that the law follows pace, can be interpreted to the benefit of the individual? Do we need specific laws like recently suggested in the robotics and AI resolution from the EU Parliament?

Thirdly, perhaps most pressingly the proprietary aspects of AI innovation must be addressed. Not only do algorithms kept hidden as trade secrets limit the accountability of the systems being built, but there is also a tendency to centralize and monopolize data in proprietary clouds. This leaves us with a core question with countless implications: Who owns our data? And it further leads us to the (Foucault) “information is power”- discussion on the importance of data symmetri in the current big data environment. Currently, there is none. Data is in the hands of the few and to the benefit of a few powerful institutions (companies and governments). Very little insight, control and data-benefits are actually shared with the individuals who provide the personal data that fuel the systems. How do we tip the balance with the development of new technology and business (as the Personal Data Store/MyData Movement)?

Back to the issue of transparency, or rather what Jenna Burrell from UC Berkley School of Information refers to as the “Opacity of Algorithms”. She talks about opacity as “intentional corporate or state secrecy”, as “technical illiteracy” or “opacity as the way algorithms operate at the scale of application”. A Mr. Loomis asked to see the criteria for his sentencing to 6 years in prison in a US court, but he couldn’t. The judge had among others decided on his sentencing, because of Mr. Loomis’s rating on the Compas assessment, an algorithm used to calculate the likelihood that someone will commit another crime. Loomis risk assessement was high. Why? He doesnt know and he will not know, because the Compass algorithm is a trade secret. What he does know is that for some reason, men’s and women’s risk assessments differ. He also knows that Compass has recently been reviewed by researchers and presented as racially biased. Was Mr. Loomis also discriminated in his sentencing? We don’t know, because we don’t know how the Compass algorithm reaches its decisions, on what data it is trained, which criteria are used or how data is clustered and categorised.

This post is based on a workshop presentation made at the Nordic Association of Engineers Jubilee 2017. See prezi here.

See also IEEE Ethically Aligned Design of Autonomous Systems