Reanalysis of STAR*D Study Suggests Overestimation of Antidepressant Efficacy

Reanalysis of the original primary outcome measure in the STAR*D study suggests STAR*D findings inflate improvement on antidepressant medication and exclusion criteria in conventional clinical trials results in an overestimation of antidepressant efficacy.

Shannon Peters

A new study, led by Irving Kirsch, Associate Director of the Program in Placebo Studies at Harvard Medical School, reanalyzes primary outcome data from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study. Results of the study, published in Psychology of Consciousness: Theory, Research, and Practice, suggest inflation of antidepressant efficacy both in the STAR*D trial reports and in conventional clinical trials.

“Comparisons of [Hamilton Rating Scale for Depression] HRSD improvement in the STAR*D trial with improvement reported in conventional trials indicate that the improvement following antidepressant treatment is substantially lower in this highly generalizable sample than it is in conventional clinical trials,” Kirsch and his colleagues write. “The actual real-world effectiveness of antidepressants is approximately half that reported in efficacy trials.”

Photo Credit: “Falling Star,” Pixabay

The STAR*D study attempted to mimic “real world” patients by recruiting from routine medical/psychiatric outpatient treatment centers and not including a placebo control. Additionally, the STAR*D did not exclude patients with comorbid diagnoses, as is often done in clinical trials. With over 4,000 participants, the STAR*D study “is the largest and most expensive antidepressant effectiveness trial ever conducted,” note the authors.  Since the STAR*D was published, many have critiqued the study’s methodology and interpretation of findings.

The first step of the STAR*D, which is the focus of the present study, was a 12-week trial of citalopram. The authors note that the STAR*D research protocol identifies the Hamilton Rating Scale for Depression (HRSD) as the primary outcome measure. However, in the initial report, HRSD results are not provided. Instead, the STAR*D presents outcomes on the Quick Inventory of Depressive Symptomatology (QIDS). The authors address issues with swapping outcome measures, as well as limitations of the QIDS (i.e., it was developed by STAR*D researchers and therefore had not been used in previous studies).

In the present study, the authors acquired the STAR*D raw data through NIMH and reanalyzed the HRSD results. Due to shortcomings with only reporting on response or remission rates, the authors also report on average improvement on HRSD scores. A total of 3,110 patients are included in the analysis. The researchers then compared their findings to a large meta-analysis of antidepressant comparator trials that also used the HRSD measure. A major difference between the STAR*D and comparator clinical trials is that conventional clinical trials often have more stringent exclusion criteria (e.g., exclude individuals with comorbid diagnoses).

Findings show that 26% of STAR*D participants achieved remission (i.e., exit HRSD score of 7 or less) and 33% were treatment responders (i.e., 50% or more improvement on HRSD score). The average improvement on HRSD score from baseline to exit was 6.6. The authors also note that this average improvement translates to only “minimally improved” in clinical significance. The authors compare these results to outcomes in the meta-analysis of clinical trials which showed a 49% remission rate, 65% response rate, and a mean HRSD improvement of 14.8.

“These results suggest that the exclusion criteria used in conventional clinical trials inflate remission rates by 89%, response rates by 101%, and continuous improvement scores by 126%,” the researchers write.

The researchers also compare their results to the reported outcomes on the QIDS in the STAR*D study. In the STAR*D, the QIDS remission rate was 30%, and the response rate was 43%. Again, these numbers are significantly higher than the HRSD scores.

“These results indicate that remission and response rates are substantially inflated on the QIDS-SR relative to the HRSD and that scores from studies reporting one of these measures cannot be compared validly with scores reported in studies using the other,” state the authors.

The researchers note some limitations of their study. First, neither the STAR*D nor the studies in the comparator meta-analysis included a placebo. Second, the comparator meta-analysis included studies from various antidepressants, while the STAR*D only used citalopram. Lastly, the authors note that the study did not take into account contextual factors that may influence participants’ depression and recovery.

The present study is the first to report on the original outcome measure of the STAR*D: the HRSD. As the authors note, “such an analysis is long overdue.” The researchers demonstrate that (a) STAR*D reporting on the QIDS rather than the HRSD significantly inflated depression improvement scores, and (b) conventional clinical trials lead to inflated estimates of antidepressant efficacy compared to “real world” clinical practice.



Kirsch, I., Huedo-Medina, T. B., Pigott, H. E., & Johnson, B. T. (2018). Do outcomes of clinical trials resemble those “real world” patients? A reanalysis of the STAR*D antidepressant data set. Psychology of Consciousness: Theory, Research, and Practice. Advance online publication. (Link)


  1. So the placebo in the RCT was something like 80% more effective than the drug in the StarD study. What rubbish performance in the real world, and what a fishy result in the RCT. What’s going on? To me, it looks like it’s all down to placebo effects in different monitoring settings. It’s getting harder to imagine a real drug effect.

    • The real drug effect has to do with ruining your sex life, making you into a zombie by distancing you from your feelings and emotions, and oftentimes making you suicidal or homicidal. There is very little to no effect on the depression that people take these damned things for in the first place. Frankly, these things are the devil’s tic tacs and nothing else. Many people find that they can’t get off them or have a very difficult time getting off them when there’s no benefit and they want to quit them. There are many real drug effects but they have absolutely nothing to do with helping depression. But GP’s and psychiatrists are prescribing the things like candy to everyone.

  2. I don’t disagree with any of that. To be honest, I don’t think we have any idea what blocking seratonin reuptake does, and certainly not a clue about the NRSA’s, NASSA’s, etc. Disabling part of your brain, well, anything can happen.

    But I do think its important to expose the very reason these drugs were ever approved: the question of efficacy. They are there, not because of a credible mechanism of action, but because RCT’s showed a statistical, although not meaningful, superiority over placebo. The fact is that the placebo effect and simple remission time have always been the predominant effects observed in the trials. So people did improve compared to baseline ie where they started, but it wasn’t really due to the drug. What we are seeing is that the closer they look at the effect size, the smaller it gets. What this paper shows is that something happens in antidepressant RCT’s to make drugs look better than they really are. In the UK, this finding should surely inform new NICE guidance on antidepressants.