Skip links

Error handling in AI Healthcare Systems

AI healthcare systems are the new craze in the medical world. For instance, AI based screening tests are increasingly being used in the field of oncology (cancer research), and AI based monitoring systems are also on the rise for diabetics, epileptics, out-patients, and so on. These systems have the potential to transform healthcare, but they also raise important issues in terms of privacy, discrimination, and data security. In the following, however, I will focus on another problem with these systems, namely the handling of errors.

Two Types of Errors

Importantly, there are two different kinds of errors AI healthcare systems can make. The first is a false positive. For example, an AI screening test might identify the presence of a disease where this is none, similar to a metal detector in an airport going off even though the relevant passenger is not carrying anything made of metal.

Conversely, the second kind of error is a false negative. For example, an AI disease monitoring system might miss an alarm even though there is in fact cause for concern (e.g. due to low blood sugar volumes). The analogy here would be an airport metal detector staying silent even though the relevant passenger is actually carrying a gun.

Obviously, errors are bad no matter whether they are false positives or false negatives. We should just design systems for a minimum of mistakes, right?

The Problem

The problem is that there is a trade-off between these two kinds of errors. That is, lowering the frequency of one type of error tends to increase the number of errors of the other kind. This is because a system designed towards catching every (true) instance of a disease will typically also catch a lot of false positives due to the high sensitivity of the system. On the other hand, a system designed towards never identifying a disease where there is none will typically give a lot of false negatives due to the high specificity of the system. 

In any event, as should be clear from these examples, errors can have widely different consequences depending on whether they are false positives or false negatives. Moreover, the prevalence of each error type is to some extent a matter of design. It is therefore not just a technical discussion how to handle errors within the context of AI based systems for healthcare, it is also (or perhaps even primarily) an ethical discussion.

Do no Harm

One of the basic principles in ethics is the principle of non-maleficence (do no harm), which can be found not only in the context of data ethics but also within the fields of medical ethics, population ethics, bioethics, and so on. As such, this principle is usually a good point of departure for ethical reflection and discussion.

How does this principle apply to AI error handling?

In many instances, false negatives have the potential to do a lot of harm when it comes to disease monitoring or diagnostics. Missing a critical alarm or a significant diagnosis can be fatal for patients if it means that they will not get critical care.

However, false positives can also cause harm. For instance, false positives can cause needless worry, unnecessary health examinations, and alarm fatigue. When enough people are involved, such harms can be massive in total, even if each individual only suffers a small amount of harm.

This raises an interesting moral problem. Is it worse when a few people suffer a lot, as opposed to a large number of people each suffering only a small amount (provided that the overall amount of suffering is the same/close to the same)? Should we just look at the problem mathematically, so to speak, and pick the lesser evil?

A Possible Solution

One way to move forward with the dilemma is through the ideas of the late and great philosopher Derek Parfit, who was one of the most academically influential moral philosophers in the last century.

According to Parfit, benefits and losses in ethics should not only be measured in accordance with their intensity, duration, certainty, fecundity, and so on (see the Hedonic Calculus) but also in accordance with how well off the relevant individuals would be in absence of a particular benefit or loss. The more worse off the individual would be, the greater weight should be given to their well-being.

In other words, we should give priority to the well-being of the least advantaged individuals according to Parfit’s view, which has fittingly been named prioritarianism.

Accordingly, on the prioritarian view, we should not only look at the aggregate or total well-being under different AI error handling schemes. We also need to look at how well-off different individuals would be – especially those who are worst off in absence of a particular benefit or loss. Both things matter morally on the prioritarian view.

This means that a small number of patients who will be significantly negatively affected by a certain error handling scheme, on Parfit’s view, should (at least up to a certain point) be given priority over a larger number of individuals each only suffering a small amount. Even if the latter suffers more on aggregate.

Consequently, for someone with a particular concern for the least advantaged individuals, prioritarianism may provide an ethically sound approach to error handling in the context of AI based healthcare.

Photo: Nik, Unsplash.com