Translation of the chapter (written in 2017) from the book Eksponeret (Danish, Gads Forlag, May 2018). Citation: Hasselbalch G., "AI: The Data Ethics Perspective", translation from Eksponeret, Gads Forlag, 2018.
Artificial intelligent technologies are complex data processing systems that pose several ethical challenges. We should consider the data intensity of these technologies and find solutions to the ethical implications in legislation, design and society in general.
When Jeff died he left a hole in the universe. There was so much she couldn’t do. She couldn’t touch him anymore, talk to him, share everyday details with him. But in a strange way he was still there somehow. In the days following the accident, she was sitting with her phone, re-reading his messages and status updates and, to her, he felt alive. His humour, his thoughts and his idiosyncrasies, it was as if he was right there. That’s when the advertisement from VivaRobots came to mind, about artificial intelligent resurrection. ‘We generate huge amounts of data online through social media, online searches, purchases and entertainment’, the nice lady in the promo video had said. VivaRobots can retrieve all this personal data, amalgamate it, and use it to recreate an artificially intelligent robot version of ourselves or our loved ones. Feed it with our data, it then learns from that data and develops a personality. That way, we can all live forever:
VivaRobots– the body may be gone but the data lives on.
The first time she saw Jeff again, she gave him an awkward hug, and even though he had a new car smell, she thought it was him. He let her stand there with her arms around him, and he said, “Do you remember last year when we watched X-Factor and Paul won? I got so angry and got into arguments with people on FootHook.com “Yes, and I held you like I’m doing now” she said. “Yes, I remember that” replied Jeff “Did it make you happy?”
Like most examples when we think of artificial intelligence, the story of Jeff is pure science fiction. Like the episode from the TV series Black Mirror that it is inspired from. Think of Hal from 2001: A Space Odyssey, R2D2, Terminator, the loveable operating system Her or Wall-E, all of which are artificially intelligent human-like technologies that represent visions of the future, society and our way of life as human beings. In reality, we are not far from being able to develop the type of technologies that can represent us, learn from our experiences and act on our behalf, even after our physical bodies no longer exist. There are already patents on technologies that we can interact with and that can perform functions that we currently consider only humans are capable of. Artificially intelligent musicians and artists, software-lawyers or robot assistants and chatbots.
TV shows, movies, literature and media stories about robots like Jeff inspire thought. In the future, will our physical life just be a small part of an endless life of robots? Do we need to decide before we die, whether the data we generate online can be used for other purposes after our death? Or is it the survivors who get to decide? Will Jeff’s resurrected robot be able to represent him as the complete person he was? Will it respect his values and opinions? What about the data he didn’t share on social media? What human qualities are misrepresented in the algorithms that recreated him? Or what about the false data that’s got nothing to do with Jeff’s “real” self (do you even have a real self?) that has been mixed in with Jeff’s data profile? When the new Jeff begins to interact with the world, what are his rights? Are they the same as human Jeff’s? Or should he have new robot rights? Who is responsible of new Jeff’s actions? VivaRobots? Should robots like Jeff have their own legal status? These are all exciting questions and thoughts about the future. But we need to move beyond the usual discussion about artificially intelligent robots from the future and address the same questions that we just asked about Jeff, about the intelligent technologies that are being developed here and now. The technologies that already influence the development of everything from our economy and politics to our social relationships and culture.
Today, most of the things we use in our everyday lives are, one way or another, connected to the internet where we share large volumes of personal data, consciously but most of the time without our knowledge. Our physical where abouts, internet habits, our faces and fingerprints are transformed into data. Data regarding our interests, political convictions and health is shared with companies, public institutions and researchers around the world. Artificial intelligence is already an integral part of our everyday lives and is one of the largest commercial bets and innovation areas in the private sector. Google, Facebook, IBM and Microsoft dedicate large parts of their budgets for the development of artificially intelligent technologies that can make sense of, and act on, data. What makes this development possible are the endless amounts of data that we generate in society, also known as big data. The intelligent technologies are shaping all areas of our everyday life. Used in politics, finance and cultural production to make sense of large amounts of data, predict patterns, analyse risks and act on that knowledge.
On an individual level, intelligent algorithms analyse our internet habits and personalise the content we see and interact with online. At the same time, macro-social structures are shaped and developed on the basis of advanced intelligent computer analysis of data. For example, the majority of all trades on the global stock market today, are performed by algorithms that calculate potential risks associated with a trade. In 2017, The Guardian journalist, Carole Cadwalladr, (1) wrote about how, whilst campaigning for American President, Donald Trump had used a data analysis company, Cambridge Analytica, to help sway American voters with individual targeted messages based on algorithms’ psychological analysis of millions of Americans’ Facebook profiles. Considering the significance that advanced data systems have on our everyday life and society today, we should move beyond the existential ethical questions about a sci fi future and address the “here and now” data ethics challenges.
Artificial Intelligence and Data Profiling
Artificial intelligence is a sophisticated technology that is widely used to categorise, predict and find patterns in data. From a computer science perspective, one can see artificial intelligence as a kind of logical agent that can perceive its environment and act on them to maximisethe chance of reaching a specific goal. But you can also choose to look at artificial intelligence as data–that is; complex data handling systems with input data (data we e.g. provide when we talk to chatbots or online news articles and social media or data they collect by trawling the internet) that the systems learn from and evolve by and act on in the form of output data. One example is Apple’s Siri that deals with verbal questions and orders either directly on the device or by searching the internet. What makes Siri intelligent is that the program evolves and learns from data you provide through the questions you ask. Thus, the program assimilates itself by creating a data profile of the user and will eventually become more of a personal assistant.
Data profiles are a prerequisite for intelligent personal technology to function optimally. However, intelligent data profiling is not only used to make life easier for us. Today intelligent technologies are, more or less, used for everything from predicting an individual’s future medical condition and there by calculate insurance premiums to recommend if someone should be paroled from prison based on a calculation of their likelihood to reoffend in the future. In that context, intelligent technologies (that goes by many names such as decision making algorithms, cognitive computing or machine learning) are, in a way, also gate keepers of society. They are personal and can help us because they know us. But they also decide what information we are presented with when we search the internet, how much we pay for a flight ticket, or what political party we hear most from and what we hear from them. Their complex computer processes are developing so quickly that it will be harder and harder to follow the rationale behind processing our data.
A technology is intelligent because it has a detailed data profile on the material it analyses, remembers and acts upon. When that material is people, whether it is a virtual assistant like Google Now, Cortana, or a piece of software that performs risk assessments on defendants in the legal system, it creates detailed data profiles that the individual typically has neither understanding of, nor control over. Intelligent system data profiling therefore has data ethics implications, as it has a decisive influence on the individual and the opportunities afforded them in society.
In the new EU General Data Protection Regulation (the GDPR), there are increased requirements for personal data profiling. That is to say the automatic assessments and analysis based on personal data, which can for example be used to predict an individual’s economic situation, health, personal interests, behaviour, etc. The intelligent technologies challenge these principles, as the technology develops, “trains” and “learns” using large amounts of data and if created to personalise services, works by creating profiles out of this data. This then creates a legal question: Is there a legal basis for the profiling? Are decisions taken on the basis of data profiles and algorithms accurate and free of bias? Are individuals treated fairly? Can we get insight into the processes that lead to specific decisions and actions based on data profiles? And what about our consent, when our data is processed and recycled continuously in intelligent data processes? A core challenge is that innovation within intelligent technologies often moves in legal grey zones. A research group at the Alan Turing Institute at Oxford University claims, for example, that the new EU data protection regulation does not contain a “right to explanation” just a “right to information”. (2) With a data ethics perspective we can look to the future and ask how can we ensure that the law follows the development and is interpreted for the benefit of the individual. For example, do we need specific laws for the development of intelligent technologies as proposed by the European Parliament in the 2017 resolution on robots and artificial intelligence? (3)
Data ethics challenges in intelligent systems
A search on an online search engine, initiates a complex series of data processing tasks. The word or phrase you search for, is considered by the search engine’s language processing algorithm, which, amongst other actions, categorises and prioritises search results based on learned data from language used and the structure it learns language from. A study from 2016, (4) showed how one of the most common methods of language processing in search engines, known as “word embedding”, places words in gender biased groupings. Boss, philosopher, financier and similar job titles are thus categorised as male job positions, whilst words such as receptionist, house keeper and nanny are grouped in another female job position category. Among other things, the algorithm has learnt its categorisations from data derived from Google News articles. Similar studies of data handling systems have, over the last few years, shown that intelligent systems reproduce social and cultural biases because it’s contained in the data they use to develop and learn from. This means, for example, that when an intelligent system learns language by reviewing news articles, the stereo typical language, specific priorities and the cultural values contained in those articles will be reproduced in the intelligent system’s processing of the data. Another frequently cited research, “Machine Bias”(5), published by ProPublica, showed a bias against black defendants in the COMPAS algorithm used to perform risk assessments of defendants in the U.S. judicial system and assess the likelihood of recidivism after release. The COMPAS algorithm had the tendency to designate black defendants, as possible reoffenders, twice as much as it did it with white defendants. At the same time, it was more common to group white defendants rather than black defendants as a low risk. The example from ProPublica highlights an important data ethics perspective on intelligent systems. They create limits and opportunities for individuals and often serve to support existing power dynamics in society.
An American man, Eric L. Loomis, asked to see the criteria for his sentence of six years in prison by a U.S. court. The judge, among other things, had reached the conclusion based on a score from the COMPAS algorithm, which showed a high probability he would commit a crime again in the future. The ProPublica study had demonstrated one type of bias in the COMPAS algorithm, but was Mr. Loomis also being discriminated against in his scoring perhaps on other grounds? It was fully within his rights for Mr. Loomis to ask for insight into the criteria for his punishment, but the COMPAS algorithm itself is a business secret, so his appeal could not to be met. (6) We don’t know how COMPAS reached its conclusion or what criteria was used, or even how data has been categorised, because the software was developed as a commercial tool. One of the most pressing data ethics issues today is the proprietary aspects of innovation within artificial intelligence, which has a decisive influence on system transparency. Today, the public generally has very little insight into data systems, and the people that these data systems analyse and operate on, have very little control over the systems in which decisions about them are made. Jenna Burrell, from UC Berkeley’s School of Information, explains the algorithms’ opacity as a problem caused by the complexity of the systems, of society’s lack of skills to understand the systems, or by corporations (or governments) defining the algorithms’ design as a business secret because they are part of corporate competitiveness.(7) The American Professor Frank Pasquale, has in his book, The Black Box Society (2015), described this lack of transparency in the algorithms as a systematic trend. He believes that the legislation has not followed pace with the development of the complex algorithmic systems, and that lack of transparency in algorithms masquerading as a business secret is simply to avoid regulation.
Considerations for Developers
“The limits of my language mean the limits of my world”, the language philosopher Ludwig Wittgenstein wrote in 1922. We develop intelligent systems to create order in our messy contemporary reality, but very rarely do we put demands on what kind of order they create. Data processing algorithms can be described as the language of the big data age, which creates structure and meaning out of unstructured data. This language is not independent of the context in which it is used but is an expression of given cultural and social norms and values and their priorities. Artificial intelligence is therefore not a free agent, with free will, able to act inscrutably on data from its own computer logic, they are social systems that represent and amplify community values and specific interests. The main message is therefore that it is a technology that we can create and have an influence on. Viewed through these eyes, means that those who design the systems are also designers of social systems more than just designers of objective mathematical systems. Therefore, as early as in the design phase, an analysis and assessment can be made of the social and ethical consequences of the data processing systems being developed: (8)
- INTERESTS: This is about who or what benefits from the data processing, as well as the explicit rationale behind. For example, you can see whether data processes, first and foremost have a commercial purpose or whether human values, rights and needs have also been incorporated into the systems.
- TRAINING DATA: What cultural values / bias does the data represent on which the intelligent technology is trained? Is it transparent and can we live with them? Can training data be manipulated? (like when Microsoft launched their intelligent chatbot Tay on Twitter, and users taught Tay to post racist and misogynistic tweets by feeding it with similar tweets).
- OPENENESS/INTELLIGABILITY: Can data processes be investigated, tracked and explained?
- RESPONSIBILITY: Can the system help take care of and eliminate the ethical data implications? Will any external “auditor” be able to understand the processes if needed?
- HUMAN CONTROL: Are the needs and values of the people affected by the technology addressed? Which power dynamics does the technology support? For example: Is the information distribution fair? Who has what knowledge about whom?Does the technology support data symmetry between the individual and community institutions?
The ethical implications surrounding the design of intelligent technologies are one of the main focuses, in several initiatives launched in recent years, to develop industry standards and policies in the field of artificial intelligence. Several of these are initiated by the industry itself, such as the Partnership on AI. Other initiatives have been initiated by organisations or from research centres. The world’s largest association of technical professionals, IEEE, is behind the Global Initiative on Ethics of Autonomous and Intelligent Systems, it consists of over a hundred experts from industry, universities and public institutions. Under this initiative they have, amongst others, developed a charter with input from several committees on topics such as personal data and finances as well as working groups to outline concrete ethical design standards for intelligent systems. The purpose of the overall initiative is described as follows:
“We need to make sure that these technologies are aligned to humans in terms of our moral values and ethical principles. AI/AS must behave in a way that is beneficial to people beyond reaching functional goals and addressing technical problems. This will allow for an elevated level of trust between humans and our technology that is needed for a fruitful pervasive use of AI/AS in our daily lives”.(9)
Data Ethics Analysis
In the book Data Ethics – the New Competitive Advantage (Hasselbalch, Tranberg 2016), data ethics is described as follows:
“Ethical companies in today’s big data era are doing more than justcomplying with data protection legislation. They also follow the spiritand vision of the legislation by listening closely to their customers.They’re implementing credible and clear transparency policies for data management. They’re only processing necessary data and developing privacy-aware corporate cultures and organisational structures. Some are developing products and services using Privacy by Design… A data-ethical company sustains ethical values relating to data, asking: Is this something I myself would accept as a consumer? Is this something I want my children to group with”. (10)
The introduction of new technologies has always, at some point during their implementation, led society to revise existing legislation, but also to revisit our ethical value systems. This isn’t the first time that society has discovered that man-made technologies are not natural facts but also have ethical and societal implications. Now a days, we are experiencing a development in data driven services, products and analytics at a pace that’s never been seen before. This means that legislation, in many areas, cannot follow the ethical consequences and so we are currently working on, what could be called an ethical review in society. One example is Facebook’s 2015 social mood experiments with their users. Facebook researchers, without the user’s knowledge of the experiment, posted negative and positive status updates respectively on thousands of users’ news feeds to test whether it would affect their mood. The Facebook technology had given the company the opportunity to collect, analyse and manipulate data on their users. The consent to Facebook’s terms of service made it legal, but was it ethically sound? It was not, according to the subsequent public debate.
The data ethics perspective on new data intensive technologies is an approach that goes one step further than what lies just within the limits of the law when dealing with data in the digital age. It takes into account the new user requirements for control as well as the ethical implications and risks for the individual. It is a constructive, action-oriented and people-centric approach, where one looks at the processes from an interdisciplinary perspective that includes technology, legislation and social and cultural aspects.
The Intelligent Toy
Toys are an example of everyday objects that increasingly introduce intelligent technology. It raises a number of data ethics issues. (11) Also known as “the internet of toys”, “intelligent toys” or “smart toys” they are internet-connected toys that remember, find patterns and respond to data from children. For example, a child talks to the toy, the toy captures the child’s voice, records what the child says and sends it, via an internet connection back to the manufacturer whose server analyses what the child says and sends an answer. At the same time, the intelligent toy learns and develops in line with the data it processes.
In 1931, the toy manufacturing giant, Fisher Price, launched the wooden duck, Dr. Doodle, who quacked and raised and lowered his head when a child pulled him across the floor with a string. Today the company is investing heavily in “smart toys” which, using different types of technologies, listen to, recognises, recalls and forms part of a unique personal relationship with children. For example, their “Smart Toy” is, according to their website “as unique as your child. It actually responds to what your child says and remembers things. It takes cues from him or her, then invites play, talk, movement, imagination and learning”.
Another type of intelligent internet-connected toy that can be purchased online for 5 to 9-year olds is the CogniToy dinosaur. It is based on one of the world’s most potent intelligence technologies, IBM’s Watson computer. The computer is world famous, because it once beat its human contestant in the TV quiz Jeopardy. This is a computer that can recognise, analyse and find patterns in human speech using complex language algorithms and answer with a question, both detailed and correct.
Within the last couple of years, many data ethics issues, in connection with intelligent toys, have been raised and reported in the media, but in the coverage, the focus has primarily been on the lack of technical security of the toys. Mattel’s Hello Barbie, a doll that listens, answers and stores a child’s voice, was immediately dubbed Surveillance Barbie after its launch. A security expert discovered, according to The Guardian newspaper, that it was very easy to hack the doll when it was connected to wireless internet, which could provide anybody easy access to the stored files with children’s voices or direct access to the doll’s microphone.
The first element in the data ethics analysis of internet-connected toys is indeed the assessment of the security and the compliance with data protection legislation. Following that, collection, storage and processing of personal data is an essential factor in their functionality, but an data ethics analysis also needs to take into account the social and ethical implications. First of all, one should look at how power relations are distributed, supported and enhanced, as well as the status of the individual and the possibility of self-control over the data processes, both now and in the future. There is very little information available regarding the creation of children’s data profiles associated with personal intelligent toys. We do not know exactly where the data profiles are stored, how they are combined with other data, if they are used in conjunction with toy manufacturers or their commercial partners’ other innovation areas (For example, the Norwegian Consumer Council’s investigation into the internet connected doll, My Friend Cayla, found the manufacturer was sharing data with partners who could reuse this data in its own innovation areas, such as, for example, the company Nuance, which develops voice recognition technologies for the military, intelligence services and police (12). However, we know that for the toy to be able to function optimally, the personal and intimate connection between a child and its toy needs to be collected and analysed – potentially it can be monitored by concerned parents, the corporations or others with their own motives. We generally don’t know anything about the criteria and principles behind the programming of the custom-designed algorithms that informs and directs the toy’s interaction with a child. Generally, especially in the light of the new EU General Data Protection Regulation, we should consider if the development of detailed data profiles on children are even legal.
As previously described, one of the greatest data ethics challenges created here is the transparency of a commercial proprietary development within intelligent technologies, including toys. One can only guess, for example, the consequences of centralisation and the monopoly of the data generated by intelligent toys. Who owns the children’s data? What can it be used for? How can we ensure children have a “clean slate” when they grow up? Do they and their parents have control over their data? The development moves in a direction where the various internet connected toys are increasingly associated with each other. It is, for example, the idea behind the company Dynepics’ internet of toys that gives an overview and insight into the child’s play and most intimate details. Thus, it is evident that there are heavy commercial interests in children’s private domains, but these are very difficult to discern in the actual data processes of the toys.
In early December 2016, a number of European and American consumer organisations got together and complained to authorities over the internet connected toys (see note 12). They had examined toys from the company Genesis, who among others manufactures the doll Cayla, and found that, as with Hello Barbie, you can access the doll’s microphone and listen in from a very long distance, but they also found that data was being sent to the United States where it is processed by voice recognition technologies, and that the company reserved the right to use the data for various purposes, for example product placement from Disney is incorporated into what the doll says to the child.
Another important element in a data ethics analysis, should include an assessment of the social implications of a particular data-driven technology, i.e. What opportunities does the technology afford us, and in this regard, the ethical choices we make as individuals based on the use of the technology? The balance between parent participation in their children’s lives and over protection is ethically and culturally based. Parents usually have access to the child’s private interaction with the intelligent toys via parental dashboards, i.e. apps that save and give an overview of the child’s play. Thus, the technology supports a social space where parents have a particularly detailed insight into their children’s private lives, something we have not had in earlier times. The toy manufacturers describe these features as an option for the parents to follow the development of their children and help them in a positive way. But a data ethics analysis of the intelligent toy also includes a weighing of the child’s right to privacy against their parents’ obligations to protect and secure their healthy development.
Conclusion: The Data Ethics Meta Narrative
It is evident that intelligent technologies’ data-intensive processes have legal implications for an individual’s privacy protection. But going beyond the immediate legal (and security) challenges, we should also look at what type of decisions we ask the systems to make for us and what influence they have on our control and empowerment as individuals today and tomorrow. One example might be the future implications of combining the many different data profiles that we are building through different services with algorithmic decision making (what consequences does it have, for example, if data profiles on children created through intelligent toys, are combined with intelligent job matching and recruitment?). One of the most important data ethics questions we should ask of the intelligent technologies is the meta-narrative, which can generally be linked to the intelligent development of society: What story about ourselves as people and society in general do we support with a data-driven intelligent development of our societies? Order, structure, control over nature, based on a logical analysis of information about the world is a societal project that has been pursued since The Enlightenment. We have now reached a fundamental point in human history, where we potentially, with intelligent technologies, can create order in and control the chaotic troves of big data that represent and increasingly also define our environment, including the ‘mess’ that human beings and their biology represent. There are many advantages but hidden in the inherent order there will always be a risk for manipulation and control. Take the imaginative playworld of children as an example, this may be an area, amongst many others, where data intelligence is not the most intelligent solution.
And maybe Jeff should also be allowed to rest in peace?
1 Cadwalladr, C. ”The great British Brexit robbery: how our democracy was hijacked”, The Guardian, 7.5.2017.
2 Wachter, S, Mittelstadt, B, Floridi, L. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation (December 28, 2016). International Data Privacy Law, 2017.
3European Parliament resolution of 16 February 2017 with recommendations to the Commission on Civil Law Rules on Robotics (2015/2103(INL)), accessed 23.12.2017.http://www.europarl.europa.eu/sides/getDoc.do?pubRef=-//EP//TEXT+TA+P8-TA-2017-0051+0+DOC+XML+V0//EN#BKMD-12.
4 Bolukbasi, T, Chang, KW, Zou, J Y., Saligrama, V, Kalai, A T., Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, NIPS, 2016.
5 Angwin, J, Larson, J, Mattu, S, Kirchner, L. “Machine Bias”, ProPublica, 2016.
6 Liptak, A. “Sent to Prison by a Software Program’s Secret Algorithms”, New York Times, 2017.
7 Burrell, J: “How the machine ‘thinks’: Understanding opacity in machine learning algorithms”, Big Data & Society, 2016.
8 Hasselbalch, G, 2017, How to Address Data Ethics in the Design Stage of AI, accessed at https://dataethics.eu/en/call-ethics-social-impact-analysis-ai/.
9 Ethically Aligned Design – A Vision for Prioritizing Human Wellbeing with Artifcial Intelligence and Autonomous Systems (EAD), version 1, IEEE.
10 Hasselbalch, G, Tranberg, P, Data Ethics – the new competitive advantage, 2016, page6.
11 Hasselbalch, G, 2016, www.datethics.eu , Data Ethical Considerations for the Internet of Toys https://dataethics.eu/wp-content/uploads/INTERNET-OF-TOYS-data-ethical-considerations.pdf
Hasselbalch, G, 2015, www.dataethics.eu: “A Toy the Wants to Phone Home” https://dataethics.eu/en/a-toy-that-wants-to-phone-home/
12 Toyfail – An analysis of consumer and privacy issues in three internet connected toys,The Norwegian Consumer Council, 2016. https://consumermediallc.files.wordpress.com/2016/12/toyfail_report_desember2016.pdf.