An AI-based health system in Copenhagen, which helps identify cardiac arrests during emergency calls, suffers from 10 ethical issues, according to an ethics self assessment of the system based on the AI High Level Group’s Trustworthy AI guidelines. In the future, all AI-systems should be assessed by independent experts before deployed, according to the assessment paper, which also gives five recommendations for improving the system.
An independent team of philosophers, policy makers, social scientists, technical, legal, and medical experts from all over the world with professor Roberto V. Zicari from Arcada University of Applied Sciences in Helsinki, as a lead, did the assessment of the AI-based health system in Copenhagen. It was a self assessment conducted together with the key stakeholders , namely the medical doctors of the Emergency Medical Services Copenhagen, and the Department of Clinical Medicine, University of Copenhagen, Denmark.
The main contribution with their new paper was to demonstrate how to use the EU Trustworthy AI guidelines in practice in the healthcare domain. For the assessment, they used a process to assess trustworthy AI, called Z-Inspection® to identify specific challenges and potential ethical trade-offs when we consider AI in practice. Thus, this is not a ‘certification’ or a green light concluding what is ethical or not – it is just
– It is not giving it a green light or concluding it is unethical highlighting the ethical issues and giving recommendations
The case study was an AI system in use in Copenhagen, Denmark. The system uses machine learning as a supportive tool to recognize cardiac arrest in emergency calls by listening to the calls and the patterns of conversation. An AI system was designed, trained and tested by using the archive of audio files of emergency calls provided by Emergency Medical Services Copenhagen in the year 2014. The prime aim of this AI system, which was introduced in the fall of 2020, is to assist medical dispatchers when answering 112 emergency calls to help them to early detect OHCA (out-of-hospital cardiac arrest) during the calls, and therefore possibly saving lives.
The research questions of the paper were: Is the AI system trustworthy? Is the use of this AI system trustworthy?
The team defined three mayor phases.
1. Set-Up phase starts by verifying that no conflict of interest exists, both direct and indirect, between independent experts and the primary stakeholders of the use case. This phase continues by creating a multi-disciplinary assessment team composed of a diverse range of experts.
2. The Assess Phase is composed of four tasks:
A. The creation and analysis of Socio-Technical Scenarios for the AI system under assessment.
B. A list of ethical, technical, and legal “issues” is identified and described using an open vocabulary.
C. To reach consolidation, such “issues” are then mapped to some of the four ethical principles and the seven requirements defined in the EU framework for trustworthy AI.
D. Execution of verification of claims is performed. A number of iterations of the four tasks may be necessary in order to arrive to a final consolidated rubrics of issues mapped into the trustworthy AI framework.
3. The resolve phase could be called ‘ethical maintainance’ and is about monitoring that the AI system fulfill the requirement over time.
Ten ethical issues arose in the assessment:
- It is unclear whether the dispatcher should be advised or controlled by the AI, and it is unclear how the ultimate decision is made. Obviously, the system was not accompanied by a clear definition of its use.
- To what extent is the caller’s personally identifying information protected, and who has access to information about the caller? Despite the fact that the AI-system follows GDPR standards there were no description of how data will be used and stored, for how long this will occur before its disposal, and what form(s) of anonymization will be maintained.
- There were no formal ethical review or a community consultation process to address the ethical implications regarding trial patients as is common in comparable studies reviewed by institutional review boards in the United States and United Kingdom.
- The training data is likely not sufficient to account for relevant differences in languages, accents, and voice patterns, potentially generating unfair outcomes.There is likely empirical bias since the tool was developed in a predominantly white Danish patient group. It is unclear how the tool would perform in patients with accents, different ages, sex, and other specific subgroups.
- The algorithm did not appear to reduce the effectiveness of emergency dispatchers but also did not significantly improve it. The algorithm, in general, has a higher sensitivity but also leads to more false positives.
- Lack of explainability. The system outputs cannot be interpreted, leading to challenges when dispatcher and tool are in disagreement. This lack of transparency may have contributed to the noted lack of trust among the dispatchers, as well as the limited training of the users.
- The data was not adequately protected against potential cyber-attacks. In particular, since the model is not interpretable, it seems hard to determine resistance to adversarial attack scenarios, such as the importance of age, gender, accents, bystander’s type, etc.
- The AI system did not significantly improve the dispatcher’s ability to recognize cardiac arrests. AIs should improve medical practice rather than disrupting it or making it more complicated.
- The trials conducted did not include a diverse group of patients or dispatchers.
- It is unclear whether the Danish authorities and the involved ethics committees assessed the safety of the tool sufficiently.
The paper comes with five recommendations covering the ethical issues above; 1) Use a model for explainability, 2) Use data sets that are built to represent the whole population and thus avoid bias, 3) Involve stakeholders, 4) Deploy a better protocol on what does or does not influence the accuracy, and 5) Assess the legal aspects of the AI system.
“We are very grateful that the medical doctors at the Emergency Medical Services Copenhagen decided to work with us in order to learn the implications of the use of their AI system and to improve it in the future. They were very collaborative and it was a really interesting experience”, said Roberto V. Zicari.
An important lesson from this use case is that there should be some requirement that independent experts can assess the system before its deployment.