A few months ago I was surprised to receive a request from an obscure journal of consciousness studies to review a paper. I was surprised because, although it was not immediately obvious from the title, the paper contained the first reports of the primary outcome measure of the massive and notorious STAR-D study, 14 years after the study was finished.
What in the world were the main findings of the world’s largest ever antidepressant trial doing being presented now in a little known journal? The answer may lie in the fact that they show how miserably poor the results of standard medical treatment for depression really are!
With 4041 participants, the STAR-D study is by far the largest and most expensive study of antidepressants ever conducted. The intention of the study was to see how antidepressant treatment combined with high quality care performed in usual clinical conditions. It did not involve a placebo or any sort of control. All treatment was provided free for the duration of the study to maximise engagement.
The paper I was asked to review, which is now published in Psychology of Consciousness: Theory, Research and Practice, is written by a group led by Irving Kirsch and based on the original data obtained through the NIMH.1 Shannon Peters describes its findings in detail.
Kirsch’s group point out that the paper that describes the design of the STAR-D trial clearly identifies the Hamilton Rating Scale for Depression (HRSD) as the primary outcome.2 This makes sense since the HRSD is one of the most commonly used rating scales in trials of treatment of depression, especially trials of antidepressants. As the STAR-D authors note in the study protocol “the HAM-D17 (HRSD), the primary outcome, allows comparison to the vast RCT literature” (cited in (1)).
Yet the outcome that was presented in almost all the study papers was the QIDS (Quick Inventory of Depressive Symptomatology), a measure made up especially for the STAR-D study, with no prior or subsequent credentials. In fact, as the authors of the present paper point out, this measure was devised not as an outcome measure, but as a way of tracking symptoms during the course of treatment, and the original study protocol explicitly stated that it should not be used as an outcome measure.
The analysis found that over the first 12 weeks of antidepressant treatment, people in the STAR-D study showed an improvement of 6.6 points on the HRSD. This level of change fails to reach the threshold required to indicate a ‘minimal improvement’ according to the Clinical Global Impressions scale (a global rating scale), which would be 7 points. It is also below average placebo improvement in placebo-controlled trials of antidepressants. A meta-analysis of paroxetine trials, for example, found that the average improvement in placebo-treated patients was 8.4 points on the HRSD.3 A meta-analysis of trials of fluoxetine and venlafaxine reported average levels of improvement on placebo of 9.3 points over just 6 weeks.4 Another meta-analysis found placebo improvement levels of between 6.7 and 8.9 points in placebo groups across trials involving a variety of antidepressants.5
The proportion of people classified as showing a ‘response’ (using the arbitrary but commonly used definition of a 50% decrease in HRSD score as per the original protocol) was 32.5% in the STAR-D study, and the proportion classified as showing remission (HRSD score ≤7) was 25.6%. The meta-analysis of placebo-controlled trials of fluoxetine and venlafaxine reported response rates of 39.9% among people allocated to placebo, and remission rates of 29.3%. In another antidepressant meta-analysis, the response rate on placebo was just above the STAR-D level at 34.7%,6 and in another it was just below at 30.0%.7
The authors of the current paper point out, however, that improvement is lower in placebo-controlled trials, even in the people treated with antidepressants, than it is in trials that compare one antidepressant against another without placebo controls. This is presumably because people in placebo controlled trials are told that there is a chance that they will receive a dummy tablet, while in comparative trials, they know they will receive some sort of active drug. Therefore they compare the results of the STAR-D study to the results of a large meta-analysis of comparative trials (cited in (6)). These find average HRSD improvement levels of 14.8 points; response rates of 65.2% and remission rates of 48.4%. Therefore the STAR-D results are approximately half the magnitude of those obtained in standard comparative drug trials.
The authors propose that the reasons for this poor performance of antidepressants in the STAR-D study is due to the selection of more complex patients. Industry studies in particular exclude people with ‘co-morbid’ conditions and symptoms or history of self-harm, and often recruit people via advertisements. It may also be due to the intensive attention and assessment procedures people undergo in industry-funded studies, and the added placebo effect of being in a trial of a ‘new’ treatment, which most trials involve.
Whatever the reason, STAR-D suggests that in real life situations (which the STAR-D mimicked better than other trials) people taking antidepressants do not do very well. In fact, given that for the vast majority of people depression is a naturally remitting condition, it is difficult to believe that people treated with antidepressants do any better than people who are offered no treatment at all.
It seems this may be the reason why the results of the main outcome of the STAR-D study have remained buried for so long. Instead, a measure was selected that showed results in a slightly better light. Incidentally, even then results were pretty poor, especially over the long-term, as Piggott et al have showed in a previous analysis.8
Whether this was deliberate on the part of the original STAR-D authors or not, it was certainly not made explicit. There should surely be uproar about the withholding of information about one of the world’s most widely prescribed class of drugs. We must be grateful to Kirsch and his co-authors for finally putting this data in the public domain.
- Kirsch I, Huedo-Medina TB, Pigott HE, Johnson BT. Do outcomes of clinical trials resemble those “real world” patients? A reanalysis of the STAR-D antidepressant dataset. Psychology of Consciousness: Theory, Research and Practice 2018;Sept 2018. ↩
- Rush AJ, Fava M, Wisniewski SR, Lavori PW, Trivedi MH, Sackeim HA, et al. Sequenced treatment alternatives to relieve depression (STAR*D): rationale and design. Controlled Clinical Trials 2004;25:119-42. ↩
- Sugarman MA, Loree AM, Baltes BB, Grekin ER, Kirsch I. The efficacy of paroxetine and placebo in treating anxiety and depression: a meta-analysis of change on the Hamilton Rating Scales. PLoS One 2014;9(8):e106337. ↩
- Gibbons RD, Hur K, Brown CH, Davis JM, Mann JJ. Benefits from antidepressants: synthesis of 6-week patient-level outcomes from double-blind placebo-controlled randomized trials of fluoxetine and venlafaxine. Arch Gen Psychiatry 2012 Jun;69(6):572-9. ↩
- Kirsch I, Moore TJ, Scoboria A, Nicholls SS. The emperor’s new drugs: an analysis of antidepressant medication data submitted to the US Food and Drug Administration. Prevention and Treatment 2002;5. ↩
- Rutherford BR, Sneed JR, Roose SP. Does study design influence outcome?. The effects of placebo control and treatment duration in antidepressant trials. Psychother Psychosom 2009;78:172-81. ↩
- Walsh BT, Seidman SN, Sysko R, Gould M. Placebo response in studies of major depression: variable, substantial, and growing. JAMA 2002 Apr 10;287(14):1840-7. ↩
- Pigott HE, Leventhal AM, Alter GS, Boren JJ. Efficacy and effectiveness of antidepressants: current status of research. Psychother Psychosom 2010;79(5):267-79. ↩