In a recent article in JAMA, Derek C. Angus writes about the Hypotension Prediction During Surgery (HYPE) trial, one of the first randomized, controlled trials of an artificial intelligence (AI) intervention. Angus discusses how this type of study provides good evidence for actually using a particular, specific AI intervention—but he highlights the limitations of this type of research, too.
Although Angus focuses on an AI intervention for hypotension, the principles he explores are relevant to mental health as well. Mental health apps using AI technology are already being used, despite the current lack of evidence for improved outcomes. Also, ethical concerns have been raised about the use of AI in all medical contexts. Angus writes:
“Although many express excitement regarding the promise of AI, others express concern about adverse consequences, such as loss of physician and patient autonomy or unintended bias, and still others claim that the entire endeavor is largely hype, with virtually no data that actual patient outcomes have improved. One issue complicating this debate is that the classic measure of clinical benefit, the randomized clinical trial (RCT), is rare in this field, if not entirely absent.”
Nonetheless, these technologies are entering the market, and even being approved by the FDA, without good evidence for improved outcomes. According to Angus, the FDA “recently approved AI-enabled decision support tools […] for diagnosis of diabetic retinopathy on digital fundoscopy and early warning of stroke on computed tomography scans. In neither instance was approval based on any RCT evidence that the information provided by the SaMD improved care.”
However, Angus goes on to discuss what that evidence would look like and asks, how would we know if an AI actually could improve care?
He provides an example of a well-conducted RCT of an AI intervention, and elaborates on what we can—and can not—learn from such a trial.
The study in question was called the HYPE trial (Hypotension Prediction During Surgery). In it, patients undergoing surgery were randomly assigned to either treatment-as-usual or to have their blood pressure monitored by a new AI program that was intended to predict whether they would have a sudden drop in blood pressure. When the AI determined that a blood pressure drop was imminent, it raised the alarm, alerting the surgery staff to intervene.
The primary outcomes were how long patients experienced low blood pressure and how low the blood pressure went. The AI appeared to improve both outcomes compared to regular surgery protocols.
According to Angus, this trial teaches us something about what we can learn about the effectiveness of AI in healthcare.
First, Angus writes that this problem was perfect for an AI. It relies on the measurement of a host of known metrics, all of which are verified, but which challenge human interpretation due to the incredible amount of data.
Second, a lot of effort went into ensuring that the surgery staff knew exactly how to respond when the AI alerted them to a problem. It’s helpful that there is a known, effective intervention for the problem—so when the AI raised the alarm, the staff knew exactly what to do.
Essentially, this was a very good use of AI—a known problem, in a specific context, with too much information for humans to adequately deal with in a time-limited setting, but with known, effective interventions to solve the problem.
These are the exact assumptions that are unlikely to be met when AI is applied in psychiatric settings. An AI in psychiatry cannot be trained on specific, measurable biometrics in the human body, as there are no biometrics consistently associated with psychological health. If an AI in psychiatry provides a warning, there are no immediate, consistently effective interventions that can be delivered to solve the problem it detects.
The issues in psychiatry do not exist in a momentary, controlled environment with an easy intervention, like raising blood pressure in a surgery room. They are contextual and happen over the long-term, and there is debate about the effectiveness of every potential intervention.
Angus, D. C. (2020). Randomized clinical trials of artificial intelligence. JAMA, 323(11), 1043-1045. DOI:10.1001/jama.2020.1039 (Link)