Recently, a number of studies have purported to use machine learning to identify people experiencing psychosis based on scans of their brains. However, numerous issues have plagued these studies, including tiny samples, overfitting of models, studies using wildly different techniques, and the potential confounding factor of antipsychotic medication-induced brain changes.
An international group of researchers wanted to determine the usefulness of these machine learning techniques—how accurate they are. They also tried to avoid some of the methodological issues in other studies. They found that these approaches were no better than chance at identifying people experiencing psychosis. They described their findings in an article in Schizophrenia Bulletin:
“Contrary to expectation, the performances of all methodological approaches tested were poor to modest across all sites. […] Current evidence for the diagnostic value of ML and structural neuroimaging should be reconsidered toward a more cautious interpretation.”
The researchers tested two types of machine learning on three different types of brain scans. They did their analyses on five different datasets to avoid overfitting, which occurs when an algorithm is very good at detecting something in the exact sample it was trained on, but terrible at doing so in any other situation.
They expected that they would find their methods to be 70-80% accurate at detecting who was experiencing psychosis.
Instead, they found that the approaches had a range of accuracies—but all of the ranges began at the low end: 50-51% accurate, which is the accuracy that would be expected by pure chance.
For instance, if 50 people had experienced psychosis, and 50 people had not (in a sample of 100), all you’d need to do is simply guess that every single person had experienced psychosis, and you would be 50% accurate. The researchers call this “poor” accuracy.
The researchers found that one method, a “deep learning” technique performed on a specific scan of “surface-based regional volumes and cortical thickness,” had an accuracy reaching 70%. Unfortunately, when they tested this method on their other samples (to prevent overfitting), they found that it also became no better than chance.
The researchers refer to this by saying the technique “generalized poorly to other sites.”
According to the researchers, “When methodological precautions are adopted to avoid over-optimistic results, detection of individuals in the early stages of psychosis is more challenging than originally thought.”
There were 956 participants, coming from five studies across four countries: China, Spain, the Netherlands, and the UK. Participants were either “healthy controls” or were identified as “experiencing their first psychotic episode.” The brain scans were magnetic resonance imaging (MRI) of three varieties: voxel-based gray matter volume (GMV), voxel-based cortical thickness (VBCT), and surface-based regional volumes and cortical thickness. They were analyzed with either “traditional” machine learning, or a type of machine learning called “deep learning” based on a “deep neural network.”
The researchers also discussed other reasons that previous studies may have appeared to find good accuracy. “Previous studies may have reported overoptimistic accuracies due to the use of inadequate sample size,” they write. Additionally, they performed an analysis that found a statistically significant publication bias, meaning that only positive results are being published, which skews the research literature.
They also write that even if these techniques were highly accurate, they would be “be of limited clinical utility. This is because, from a clinical translation perspective, the real challenge is not to distinguish between patients and disease-free individuals, but to develop biological tests that could be used to choose between alternative diagnoses and optimize treatment.”
They conclude, “We encourage researchers to continue pursuing the integration of ML and neuroimaging while exercising caution to avoid inflated results and ultimately a distorted view of the potential of this approach in psychiatric neuroimaging.”
****
Vieira, S., Gong, Q., Pinaya, W. H. L., Scarpazza, C., Tognin, S., Crespo-Facorro, B., . . . & Mechelli, A. (2020). Using machine learning and structural neuroimaging to detect first episode psychosis reconsidering the evidence. Schizophr Bulletin, 46(1), 17-26. (Link)