New Algorithms Fail to Predict Antidepressant Treatment Outcomes

Researchers suggest that because most antidepressant “success” is due to the placebo effect, they may never find a way to predict outcomes.

Peter Simons

Computer science researchers from Tufts teamed up with Harvard and Mass General medical scientists to try to create new predictive algorithms (machine learning models) to determine whether antidepressant treatment would result in success. But the predictive value was just slightly better than chance—and information about specific drugs added no predictive value at all.

The researchers reported that their algorithms could guess correctly about 60% of the time—just slightly above the 50% threshold of a coin toss. Researchers in other fields of medicine suggests that a 70% to 80% predictive threshold for this type of test is considered “acceptable,” with an 80% to 90% prediction rate being considered “excellent.”

Nonetheless, the researchers suggest that their results “suggest that coded clinical data may facilitate prediction of antidepressant treatment outcomes.”

The research was led by Michael C. Hughes, a computer science researcher at Tufts University, and published in JAMA Network Open.

Hughes and his colleagues used data on 81,630 people with a diagnosis of ‘major depressive disorder’ who were treated with antidepressants between the years 1997 and 2017. The researchers split the participants into two groups; one group was used to develop the algorithm, and the other group was used to test the already-developed algorithm. This follows best practices for developing machine learning predictive models because it helps prevent overfitting.

The information given to the models included diagnostic data and data on treatment notes but did not include the addition of specific measures to rate depression. This was a deliberate choice: the researchers wanted to know if they could predict whether someone would respond to treatment based only on existing data, rather than collecting more and more data as treatment continued.

The outcome measure was “stable treatment response,” which the researchers defined as continuing the same prescription for an antidepressant for 90 days. The researchers believed that continuing to take the same antidepressant for 90 days could serve as a proxy for antidepressant treatment success—assuming that if the drug were not working, patients would discontinue taking it or be prescribed a different medication within that period.

However, there are other explanations for why someone might take the same drug for 90 days—such as the doctor or patient wanting to take at least three months to see if the drug might start working.

The researchers could not find a way to use an actual outcome measure for treatment success, such as reductions in depression symptoms, remission of depression entirely, or patient-reported outcomes such as improved quality of life. So the algorithm was not designed to predict actual depression outcomes, just a proxy that might have other explanations.

Even with this poorly-designed outcome measure, their results were just slightly better than chance—below the threshold for “acceptable” prediction that appears in the research literature for other medical specialties.

The researchers hypothesized that their second algorithm, which included data on the specific drugs used, might better predict outcomes. For instance, some people might improve more on citalopram, while others on fluoxetine—they thought.

But this was not the case. Adding this extra data provided no additional predictive ability. The average predictive power of the models that included drug-specific information was slightly lower than the general prediction models. They write:

“Contrary to our hypothesis, the development of treatment-specific predictors instead of general predictors did not meaningfully improve prediction. This may reflect the observation that much of antidepressant response may be considered to be placebo-like or nonspecific.”

According to the researchers, the reason that they could not predict response to specific antidepressants is that response to antidepressants is almost entirely due to the placebo effect. They explain:

“Placebo response is substantial such that nonspecific predictors may outperform drug-specific ones.”

The researchers write that their study should provide a jumping-off point for future studies of machine learning algorithms, which might improve the predictive value. Still, they express caution:

“Once such models emerge, prospective investigation will be needed to assess the extent to which they meaningfully improve outcomes, if at all.”



Hughes, M. C., Pradier, M. F., Ross, A. S., McCoy Jr, T. H., Perlis, R. H., & Doshi-Velez, F. (2020). Assessment of a prediction model for antidepressant treatment stability using supervised topic models. JAMA Netw Open, 3(5), e205308. DOI:10.1001/jamanetworkopen.2020.5308 (Link)


  1. “According to the researchers, the reason that they could not predict response to specific antidepressants is that response to antidepressants is almost entirely due to the placebo effect.”

    Why should it be legal to force drug people with a drug class, that the scientists already know and freely admit “that response to antidepressants is almost entirely due to the placebo effect?” That makes no logical sense.

    They should try to test the antipsychotics, since their effects are not “almost entirely due to the placebo effect.” Although if they did, they’d likely find that almost no one should ever be given that class of drugs.

  2. If I were permitted to use antidepressants, the first thing I’d look for is the number and frequency of perceptual distortions my would be patient was experiencing, as the chances of bad antidepressant experiences goes up with the increased number (and kind) of distortions the would-be patient is experiencing.