German psychiatrist Stefan Leucht and colleagues have produced another really important paper.1 The results indicate that the small differences usually found between antidepressants and placebo are far below the sort of differences that would be clinically detectable or meaningful.
Leucht et al. have conducted the first thorough, systematic attempt to provide some empirical evidence about what constitutes a clinically meaningful difference in scores on depression rating scales, such as the Hamilton Rating Scale for Depression. Although the study did not set out to explore antidepressant effects, these are the scales that are used to assess the efficacy of antidepressants in placebo-controlled trials. In 2004, the National Institute of Clinical Excellence declared that a Hamilton score difference of three points was clinically significant.2 This estimate seems to have been plucked out of the air, however. At least the Institute never provided any explanation of what it was based on, and it was removed from the updated Guidance published in 2009. Leucht et al’s analysis shows this estimate was wildly optimistic!
The study used data on the antidepressant mirtazapine gathered from 43 trials in people diagnosed with ‘major depressive disorder.’ The authors used a ‘linking’ method to look for correspondences between scores on the commonly used Hamilton Rating Scale for Depression and another commonly used instrument, the Clinical Global Impressions (CGI) Scale.3 The Hamilton is one of the most widely used rating scales for assessing the effects of antidepressants. The usual version includes 17 questions, and has a maximum score of 52.4 The CGI consists of two clinician rated scales, one focusing on the severity of the condition and the other on the level of improvement, both rated on a scale of 1 to 7. The CGI is said to be ‘intuitively understood by clinicians’ (1, p 243) and has good inter-rater reliability.5
The paper concentrates on evaluating the commonly used criteria for ‘response’ (50% reduction in Hamilton baseline score), and ‘remission’ (Hamilton score of 6 or less). The authors conclude that these criteria are valid because they correspond to a CGI-improvement score of 2 (‘much improved’) and a CGI severity score of 1-2 (‘not at all’ to ‘borderline mentally ill’), respectively.
The most interesting results are buried in the middle of the paper, however. This is the section that reports on the linking of absolute change in Hamilton scores to CGI-improvement scores. A reduction of 3 points on the Hamilton corresponded to a score of 4 or ‘no change’ in the CGI-improvement scale. In other words, clinicians could not detect differences of 3 points on the Hamilton when asked to rate a patient’s overall improvement.
Reading from Figure 3 in the paper, a CGI-improvement score of 3 (minimally improved) corresponded to a change in Hamilton score of around 8 points. To attain a CGI score of 2 (‘much improved’), required a change of 14 points.
In a well-publicised meta-analysis by Irving Kirsch and colleagues, the overall difference between antidepressants and placebo was only 1.7 points on the Hamilton scale.6 Subsequent studies have reproduced these small effects.7 These effects are obviously well below the level corresponding to a ‘minimal improvement’ on the CGI. A 2 point difference is even lower than the 3 point difference that corresponded to ratings of ‘no change’.
When considering the CGI-severity scale, the reduction in Hamilton scores associated with moving from one category of severity to another (from mildly ill to moderately ill, for example) was between 5 and 6 points. Thus differences between antidepressants and placebo would not be sufficient for people to go from one category of severity to another.
Much has been made of the fact that people with severe depression in Kirsch et al’s analysis showed slightly larger differences between drug and placebo. Even in this group, however, the difference was only 4 points, only just over the threshold of detectability on the CGI, and not reaching criteria for minimal improvement, or for moving from one category of severity to another.8
The analysis of ‘response’ criteria also reveals that making a ‘response’ to treatment equates to a reduction of 12 points in the Hamilton scale. This represents a 50% reduction of average baseline Hamilton scores, which were 24 points on average in the included trials. The authors suggest this is fairly typical of severity levels in antidepressant trials in general. Thus the amount of change considered to constitute having made a ‘response’ to treatment is also far greater than the differences between antidepressants and placebo.
A reduction of 2 points on a scale of 52 has always seemed like an insignificant amount, but Leucht et al provide some empirical evidence to support this hunch. There are problems with this analysis, of course. The CGI may seem intuitive, but it is likely to be highly subjective and it reflects what clinicians observe, and not what patients feel. There are many problems with the Hamilton scale too, however.
I have written previously about the methodological flaws in placebo-controlled antidepressants studies, and particularly about the lack of blinding, and possibility that some drug induced psychoactive effects may modify depression rating scale scores independent of any effect on underlying biological processes (“Why There’s no Such Thing as an ‘Antidepressant’“). Leucht et al’s analysis suggests that the very modest differences between antidepressants and placebo, even if they are real and not artefacts of trial design, are not large enough to be clinically meaningful.
Does this matter? If antidepressants really were just Smarties, and had no adverse effects, perhaps not (although there are psychological consequence from taking tablets, which might cause problems too). Antidepressants are not inert, however. Like all active drugs they change the body in ways that we are not fully aware of, and can have rare and long-term consequences that do not show up readily in clinical trials. For that reason alone, we need to be sure that antidepressants really do have worthwhile effects. This is the latest research to suggest they do not.
- Leucht S, Fennema H, Engel R, Kaspers-Janssen M, Lepping P, Szegedi A. What does the HAMD mean? J Affect Disord 2013 Jun;148(2-3):243-8 http://www.jad-journal.com/article/S0165-0327(12)00834-8/abstract .
- National Institute for Health and Clinical Excellence. Depression: Management of depression in primary and secondary care. Clinical practice guideline Number 23. London: National Institute for Clinical Excellence; 2004.
- Guy W. The Clinical Global Impression Scale. ECDEU Assessment Manual for Psychopharmacology- Revised.Rockville, MD: US Department of Education, Health and Welfare; 1976. p. 218-22.
- Hamilton M. A rating scale for depression. J Neurol Neurosurg Psychiatry 1960 Feb;23:56-62.
- Nierenberg AA, DeCecco LM. Definitions of antidepressant treatment response, remission, nonresponse, partial response, and other relevant outcomes: a focus on treatment-resistant depression. J Clin Psychiatry 2001;62 Suppl 16:5-9.
- Kirsch I, Moore TJ, Scoboria A, Nicholls SS. The emperor’s new drugs: an analysis of antidepressant medication data submitted to the US Food and Drug Administration.. Prevention and Treatment 2002;5.
- Turner EH, Matthews AM, Linardatos E, Tell RA, Rosenthal R. Selective publication of antidepressant trials and its influence on apparent efficacy. N Engl J Med 2008 Jan 17;358(3):252-60http://www.nejm.org/doi/full/10.1056/NEJMsa065779 .
- Kirsch I, Deacon BJ, Huedo-Medina TB, Scoboria A, Moore TJ, Johnson BT. Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Med 2008 Feb;5(2):e45http://www.plosmedicine.org/article/info%3Adoi%2F10.1371%2Fjournal.pmed.0050045 .