Differing Depression Diagnostic Tools May Influence Research Findings

The type of diagnostic assessment used in research settings, either fully structured or semi-structured interview, may affect which participants in receive a diagnosis of major depression

Shannon Peters

A new study, led by Brett Thombs, compares semi-structured versus fully structured diagnostic interviews to assess for major depression in research settings. Thombs is a professor at McGill University and a senior investigator at the Lady Davis Institute for Medical Research at Jewish General Hospital. The results of the study, published in The British Journal of Psychiatry, suggest that the type of diagnostic interview used may result in different prevalence rates of depression in a given study, depending on the symptom severity of the sample. The researchers write:

“By standardizing all questions and probes and removing clinical judgment, fully structured interviews are designed to be as reliable as possible, but this may reduce advantages of semi-structured interviews related to the inclusion of a framework for incorporating clinical judgment.”

Photo Credit: Max Pixel

In the past, major depression was diagnosed for research purposes using clinical judgment or unstructured interviews. Over time, standardized diagnostic interviews were developed to increase agreement across diagnosticians. Diagnostic interviews can be semi-structured, where standardized questions are asked, but interviewers can also ask additional questions and use their clinical judgment. Common semi-structured interviews include the Structured Clinical Interview for DSM (SCID) and the Schedules for Clinical Assessment in Neuropsychiatry (SCAN).

Alternatively, diagnostic interviews can be fully structured, consisting of fully scripted questions that are asked with no follow-up questions, and that can be completed by a layperson rather than a clinician. Common fully structured interviews include the Composite International Diagnostic Interview (CIDI) and the Diagnostic Interview Schedule (DIS). The Mini International Neuropsychiatric Interview (MINI) is another fully structured interview. The MINI is very brief, so it can be administered quickly and was designed to be over-inclusive, so it has high rates of false-positive diagnoses (e.g., diagnosing someone with depression who does not meet the criteria).

According to the authors, “existing meta-analyses on depression screening tool accuracy have treated both interview types as equivalent reference standards.” However, the authors question whether different interview types actually result in different patterns of depression diagnosis. Three previous studies found that rates of depression were twice as high when diagnosed using a fully structured interview rather than semi-structured.

Therefore, the researchers state, “the objective of this study was to examine the association between diagnostic interview method and major depression classification.” The researchers used data collected to assess the diagnostic accuracy of the Patient Health Questionnaire-9 (PHQ-9).  They conducted an individual participant data meta-analysis, which synthesizes participant-level data from multiple studies.

The researchers analyzed data from 57 studies with a total of 17,158 participants. Of those, 2,287 were diagnosed with major depression. Half of the 57 studies used semi-structured interviews (n=29) and the other half used fully structured interviews (n=28). The SCID was by far the most common semi-structured interview (n=26). The MINI was the most common fully structured interview (n=14), followed by the CIDI (n=11).

Consistent with previous research, the authors find “participants interviewed with the MINI were substantially and statistically significantly more likely to be classified as having major depression” than participants interviewed with other fully structured interviews. Participants were twice as likely to be diagnosed with depression via the MINI versus the CIDI.

When the MINI was excluded from analyses, a pattern emerged where participants with low-level depressive symptoms (based on PHQ-9 scores) were more likely to be diagnosed with major depression via a fully structured interview than a semi-structured interview, although this finding was not statistically significant. On the other end of the spectrum, participants with high-level depressive symptoms were statistically significantly less likely to be diagnosed with major depression via a fully structured interview than semi-structured.

“This suggests that, in practice, the effect of the diagnostic interview that is selected on the prevalence that is generated likely depends on the underlying distribution of symptom levels in the population,” write the researchers.

The authors cite existing literature that corroborates their results. Two studies found that in the general population, where depression symptom levels are low, fully structured interviews tend to over-estimate prevalence of major depression. In contrast, a study at an alcoholic treatment unit, with higher levels of depression symptoms, did not find a difference in prevalence rates for fully versus semi-structured interviews.

Scholars have questioned the reliability of the criteria for major depression and raised concerns that screening for depression can result in false positives. Therefore, a better understanding of how the type of assessment used can impact depression diagnosis is essential. The researchers conclude:

“Based on the findings of the present study, caution is warranted when deciding which interview to use. Prevalence estimates may be influenced, potentially substantially, by this choice.”


Levis, B., Benedetti, A., Riehm, K. E., Saadat, N., Levis, A. W., Azar, M., … & Thombs, B. D. (2018). Probability of major depression diagnostic classification using semi-structured versus fully structured diagnostic interviews. The British Journal of Psychiatry, 1-9. Advance online publication. doi:10.1192/bjp.2018.54 (Link)

Previous articlePeter Gordon: Addressing the Divide Between the Arts and Medical Sciences
Next articleThe Sound of Madness
Shannon Peters
MIA-UMB News Team: Shannon Peters is a doctoral student at the University of Massachusetts Boston and has a master’s degree in mental health counseling. She is particularly interested in exploring the impacts of medicalization and pathologizing the experiences of individuals who have been affected by trauma. She is engaged in research on the effects of institutional corruption and financial conflicts of interest on research and practice.


  1. In reality, all the doctors do is ask every patient if they are depressed. If the person says yes or thinks about it, they get an antidepressant prescription. One question only, no “interviews” used. And all smokers get an antidepressant prescription, under the guise they are “safe smoking cessation meds.”

  2. A bigot number of SSRI clinical trials use the first 17 items of the HAM-D. One of those pertains to mood. A whole bunch pertain to sleep. People who stop losing weight are coded as getting better, while losing weight counts as a sign of depression. Perfect.

    The HAM-D was created before hating yourself for being such a fat pig had emerged as the leading cause of declaring oneself yourself depressed and heading off to the shrinkie. (In such cases one should mention Wellbutrin off-handedly, then hazard a sidelong glance at the Drug God to see if it’s going to require more effort than one off-hand mention. Wellbutrin is the short-term ticket if you’re willing to risk seizures and rage attacks. (I mean irrational blow-outs directed at strangers in public-the kind you get arrested for.)

    So many drug trials offer sleeping pills to patients that improved HAM-D scores could easily be explained by the sleeping pills.

  3. It is almost laughable to see these efforts to “standardize” the “diagnosis” of an abstract entity that can’t be defined in any kind of objective terms. Researchers substitute consistency (do people come up with the same answer with the same people) for validity (do these “measurements” actually represent a legitimate homogeneous grouping of people based on that which is being measured?) If you know which people have cancer by some objective means, you can run tests to see if these tests detect cancer and perhaps start identifying it earlier. But these checklists don’t identify anything, or specifically, they identify people who answer the questions in a certain way that they have decided means something they want it to mean. If you find out if a person’s depressed by asking them if they feel depressed, you’ll get a pretty high agreement amongst raters – they will diagnose “depression” if the person says “I’m depressed.” These checklists are a little more subtle than that, but it amounts to exactly the same thing. You might as well just talk to the person and ask them how they feel and not bother with the trappings of pseudo-objectivity. At least you’re being more honest that way.