Are Different Depression Scales Measuring the Same Thing?

A new study points out the heterogeneity of what is commonly considered to be a consistent experience.


In a new article published online ahead of print in the Journal of Affective Disorders, psychologist Eiko Fried examines the wide range of symptoms that appear on depression rating scales. Fried, from the University of Amsterdam (UvA), analyses the degree to which symptoms overlap on seven major depression scales and his results call into question the reliability of the research on depression.

“These findings imply that the routine practice of using scales as interchangeable measurements of depression severity is problematic and may pose a major threat to the generalizability and replicability of depression research,” he writes,

“Given the high prevalence rates and burden caused by [major depressive disorder] MDD, and the size of the research field – a non-exhaustive search of a few databases and a small number of journals identified around 50.000 depression articles published between 1990 and 1999 alone – the severity of this situation can hardly be overstated.”

Photo credit: Pexels
Photo credit: Pexels

Depression is a highly researched topic. Researchers from the social sciences to neuroscience and genetics use depression severity scales in their studies. For example, three different papers, each establishing a measurement tool for depression severity, find their way into the 100 most cited papers in all of science.

It is common for researchers to pick one such measurement tool for their study (it is estimated that there are over 280 different instruments to choose from). Often, studies then draw conclusions about depression in general, rather than making narrower statements about the type of depression measured by a particular scale. This wouldn’t pose a problem, necessarily, if it could be assumed that all of these different tools were similarly quantifying the same underlying experience of depression.

“If this assumption does not hold,” Fried writes, then “results of depression studies may be idiosyncratic to the particular scale used, posing a major challenge to the replicability and generalizability of depression research.”

Fried points to several reasons why this assumption may be misguided.

  • Studies that use multiple depression severity scales often find that individuals perform significantly different on different scales
  • Psychometric analyses suggest that these scales are multidimensional, meaning they are simultaneously measuring several constructs and not just ‘depression’
  • Research shows that depression can present very differently in different people and that different symptoms appear to respond differently to various treatment approaches

To assess the degree to which the symptoms being measured overlap with one another, he examined seven common rating scales: The Beck Depression Inventory (BDI-II), Hamilton Rating Scale of Depression HRSD, CES-D, Inventory of Depression Symptoms (IDS), Quick Inventory of Depressive Symptoms (QIDS), Montgomery-Asberg Depression Rating Scale (MADRS), and the Zung Self-Rating Depression Scale (SDS).

The results show a total of 52 specific disparate depression symptoms measured across the scales. For instance, the HRSD tends to look at more physical symptoms like weight loss and delayed movements, while the BDI-II focuses on cognitive symptoms, like feeling guilty or worthless.

Across all of the scales, the overlap was low between the symptoms being measured on each scale.  The results reveal that 40% of all symptoms appeared only in a single scale and that only 12% of symptoms were consistent across all instruments.

“Since different instruments capture different aspects of the heterogeneous depressive syndrome, there is the risk that the selection of a particular scale for a study may severely bias results,” Fried concluded. “Considering the persistent lack of progress in core research areas such as antidepressant efficacy and biomarkers robustly associated with depression diagnosis, this topic deserves more attention in contemporary research.”



Eiko I. Fried, The 52 symptoms of major depression: lack of content overlap among seven common depression scales, Journal of Af ective Disorders, (Abstract)


Mad in America hosts blogs by a diverse group of writers. These posts are designed to serve as a public forum for a discussion—broadly speaking—of psychiatry and its treatments. The opinions expressed are the writers’ own.


Mad in America has made some changes to the commenting process. You no longer need to login or create an account on our site to comment. The only information needed is your name, email and comment text. Comments made with an account prior to this change will remain visible on the site.


    • I think that is up to the person concerned. Some people find it useful to have a measure of how they are doing as it can inspire hope if they see an improvement over time. Some do not.

      These scales are blunt instruments.

      What the article does however is cast doubt on the validity of a huge swath of antidepressant and other biomedical research into the effectiveness of treatment for depression.

      Report comment

  1. Actually Hamilton, of the HRSD doesn’t consider his own scale to be much good. Of course he keeps the royalties. I thought there wasn’t much reliable evidence anyway – particularly if we’re talking about actual observable clinical changes such as returning to work, improved family/personal relations etc. And, in the end isn’t everything subjective? If someone says I don’t feel better, it doesn’t matter what the test says, or an observer sees. I’ve seen a person told by staff they they are much better, and the person says but I’m not, but the record shows that they’re much better. How can any of it be reliable if there is so much reporting bias?

    Report comment

  2. JohnH wrote, “… a measure of how they are doing … ”

    So are you saying that the objective is to feel better, become a happy camper? Doing these sorts of tests on people is certainly going to have the effect of making them feel that.

    How beaten down and broken do they have to be to even be willing to cooperate?

    I continue to say that the best response, and the only response we need, is the middle finger.


    Report comment