Screening Instruments Do Not Reflect Individual Experiences of Depression

Researchers detect discrepancies between the language used to describe lived experiences of mental health and the language used in modern screening tools.

Hannah Emerson

August 3, 2018

1035

A new study investigates the relationship between how people discuss their mental health and the language used to describe, label, and categorize it. The researchers utilize a mixed-methods analysis of 698 interviews on emotional health and depression screening instruments and explore the disconnect between lived experiences of depression and computational measures of depression.

“Categories for mental health risk being so articulated and abstracted that they lose touch with the diversity of illness experiences,” the authors Arseniev-Koehler, Mozgai, and Scherer, write.

“This paper re-examines the detection of depression from language and revisits old and current debates in mental health classification. Along the way, we highlight strengths and weaknesses of modeling approaches and propose several strategies for more reflexive modeling.”

While the prospect of valid tools to detect mental health disorders has inspired a vast amount of research over the years, this study calls attention to the discrepancy between these tools and descriptions of individuals’ personal experiences.

Arseniev-Koehler and colleagues explain that nearly a century of research has produced modern screening instruments for detecting depression but they note, “particularly in the realm of mental health, we can’t take labels at face-value.” They argue that mental health labels are too often understood as “objective truth,” but “unlike a ‘broken bone,’ or a ‘sprained wrist,’ mental health is a gray area. Mental health is largely defined by our conceptions of what is ‘normal’ and what is ‘disordered’— conceptions which can change across culture and time.” Thus, while psychiatric diagnoses are designed to maximize reliability, they are weak regarding validity.

The authors reviewed efforts to detect depression from written text data and transcribed verbal data by examining peer-reviewed research that identifies and predicts depression from text data. Additional quantitative and qualitative evidence is drawn from 698 interviews of the Distress Analysis Corpus (DAIC), obtained from two populations living in Los Angeles, the public, and veterans of the U.S. armed forces. Ellie, an avatar, conducts the interviews in a way that simulates a mental health screen. The 8-item version of the Patient Health Questionnaire (PHQ-8) is included in these interviews.

For their qualitative data, the authors open-coded a subset of interviews of participants talking about mental health and emotions, then searched for lexicon relevant to depression (e.g., depressed, sad, blue, happy, content), and finally open-coded interview sections with this lexicon and compared the data to the interviewees PHQ 8 scores.

Participants in this study averaged a PHQ-8 score of 6 (a score of 10 or higher is considered having depression) and 25% scored as currently meeting criteria for depression according to the scale. Additionally, individuals with higher PHQ-8 scores used more words expressing negative emotions as well as the first-person singular (“I”) rather than third-person singular pronouns (“we”), in accordance with extant research on individuals experiencing depression.

The qualitative and quantitative data for this study was often “mismatched.” For example, some participants rated as low risk of depression according to the PHQ described struggling with depression in their interviews. Arseniev-Koehler admits that what we label as depression remains “enigmatic in medicine and psychiatry.” The researchers write:

“In modern psychiatry, diagnoses are descriptive, co-occurring clusters of symptoms. They do not reference to underlying mechanisms or causes, and categories provide little information on treatment responses.”

The data also points to inconsistent understandings of terms like depression, happiness, contentment, and other states of mood. For example, one participant asked for clarification when asked the last time they were happy, inquiring, “What type of happiness are you looking for?” Another stated that they are seeking contentment rather than happiness, an attempt to put into words the meaning of personal experiences and feelings.

Concerning the PHQ and other self-report diagnostic scales, the researchers write, “implicitly, these scales are proxies for psychiatric ratings from structured interviews. Of course, self-report diagnostic scales are an imperfect proxy.” An algorithm used to predict PHQ scores from the language “likely has a wide margin for errors for detecting depression when compared to a mental health professional rather than the proxy measure on which it is trained.”

The authors suggest the following modifications to current diagnostic approaches:

Underlying models should depict depression as continuous and more dimensional, including duration of the depressive episode, depression history, and level of impairment to livelihood.
The focus should be on detecting symptoms of depression rather than detecting depression itself.
In developing models, it may be imperative to work with low specificity (proportion of those without depression who are correctly detected as not having depression) and precision to enable greater sensitivity (accurately identifying those who have depression).
For valid constructs of mental health, incorporate multiple clinicians’ ratings, along with other clinical/non-clinical measures.
Consider how to develop predictive models that include “the uncertainty in our understanding of depression and other cultural idioms of distress.”

Ultimately, this study is among others urging researchers and clinicians to consider reflexively the culturally constructed labels used in mental health. Arseniev-Koehler and colleagues conclude:

“While research in this area has recently focused on the production of high-performing models, it seems likely that literature will soon reach saturation in the number of published models. Now, models will need to be reflexively tuned, borrowing additional insights from areas such as medicine and social sciences.”

****

Arseniev-Koehler, A., Mozgai, S., & Scherer, S. (2018). What type of happiness are you looking for? – A closer look at detecting mental health from language. Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, (pp 1-12). New Orleans, LA. (Link)

12 COMMENTS

Julie Greene, MFA August 6, 2018 at 7:43 am

Another sad example of how these professionals assume they have power to diagnose and determine some are mentally inferior. Note the language used in the quoted passages, the use of “we” when describing those who diagnose.

“We,” therefore, does not refer to the entire human population, but a group of elites who claim they know better than the rest of humanity. Apparently they hold so much power that they are the authorities on who is suffering and who isn’t. Baloney.

Report comment

Reply
- bcharris August 6, 2018 at 7:41 pm
  
  Apparently, these professionals didn’t bother themselves with the 50 year old Hoffer/Osmond (HOD) Diagnostic, which is quantitative and inquires directly about experiences. Maybe this is because it’s inconsistent with initial diagnoses, but more consistent with final diagnoses than the initial diagnoses are- humiliating to the professional diagnostician, who prides himself on his diagnostic skill. That, and the fact of both Hoffer and Osmond being advocates of the use of megavitamin B3 as a primary treatment for schizophrenia in lieu of psychiatric drugs, puts their test into the world of crystal gazing and evil sorcery, instead of modern organized psychiatry.
  
  Report comment
  
  Reply
- Steve McCrea August 7, 2018 at 10:26 am
  
  It seems the proper conclusion would be, “We (the professionals) really suck at predicting anything to do with ‘depression’ and should give up on our ridiculous tests and just ASK people what’s going on, since that appears to give much more accurate and useful results.” Your point about pronouns is very well taken, as well – why does “we” not include the client “we” are supposed to be helping? Perhaps this is the center of “our” difficulty in predicting “depression?” Perhaps “we” need to give up on the idea that “depression” is a thing to be measured in the first place?
  
  Report comment
  
  Reply
  - FeelinDiscouraged August 7, 2018 at 2:49 pm
    
    To quote Oldhead loosely: “Who the XXXX are we? Who the XXXX are they?”
    
    I’m curious. Doesn’t the writer realize most of her readers are are They?
    
    Report comment
    
    Reply
    - Steve McCrea August 7, 2018 at 6:46 pm
      
      Apparently not. Or else they are writing only to their colleagues, considering the actual patients/clients to be too far beneath them to bother talking to.
      
      Report comment
      
      Reply
- Red Squirrel August 8, 2018 at 11:24 pm
  
  Agreed Julie. Researchers and clinicians validate scales before they are used. They often become the exclusive “we”. However, I see it as super messy because not only might they “we” be blind to the patient’s perspective, but in general, the different ideas and experiences that any person is exposed to will shape what elements are seen as problematic and therefore worthy of measurement by a scale.
  
  Let me illustrate this with some of my problems with scales. The first time that I was introduced to evaluation of my mood was when I was younger and my well-meaning parents said something like “we are worried that you might have depression”, however, prior to that, I had often thought of my unhappiness being because “my life sucks”. If I were to have made a scale as a teen it might have involved elements for my dislike of various things. My perspective changed with involvement in the system, and I then learned to think more in terms of symptoms. I had assessments, phq9s and treatment for depression which followed me into adulthood. After many years of treatment I was even enrolled in something where I had to take a form of the BDI almost every day. Overall, this approach generated by other people had harmed me.
  
  I have since departed from thinking like that and I now have a similar but more mature view than when I was a teen. I’m now working to improve both the circumstances of my life and my acceptance of circumstances that I cannot change. (As well as other things.)
  
  Would I want to take a scale now? No, because at this point in my life, I actively resist thinking about my life in terms of “symptoms” which seems to be serving me well. Who would I trust to create a useful scale to be used for everyone? I don’t know because every person has a different perspective. Could some people or computers somewhere make a scale that could be useful for some people? I think so.
  
  Report comment
  
  Reply
  - Steve McCrea August 8, 2018 at 11:38 pm
    
    I think you make a great point – not only the scales themselves, but the decisions of what to “measure” are very much culturally bound, which prevents them from ever really being “scientific” in the sense of truly objective. And I also have found, for me and for others, that thinking in terms of what I don’t like and want I want to change and what I do like (learned this one a LOT later in life) and what I want to preserve and appreciate is much more helpful that thinking of “what is wrong with me?”
    
    Report comment
    
    Reply
    - Red Squirrel August 9, 2018 at 4:48 am
      
      Exactly. And ya, having your attention brought back to every nuance of “what is wrong with you” repeatedly through these questions, might cause harm.
      
      Report comment
      
      Reply
      - Steve McCrea August 9, 2018 at 11:41 pm
        
        I am quite sure that it does. In fact, if you wanted to drive someone to “mental illness,” telling them constantly what they were doing wrong or telling them what was “wrong with them” would be a very efficient method!
        
        Report comment
      - Julie Greene, MFA August 10, 2018 at 6:25 am
        
        Steve, if you constantly tell someone they are doing something wrong or that something is wrong with them, I would think they would get fed up and quit or walk away! I did!
        Julie
        
        Report comment
      - Steve McCrea August 10, 2018 at 11:11 pm
        
        I think that is the healthiest response. Of course, then you get labeled with “Anosognosia,” but if you’re far enough away not to hear it, it doesn’t matter what they call you anymore.
        
        Report comment
- Red Squirrel August 10, 2018 at 10:16 am
  
  Glad to hear you got away Julie :). I believed their evaluations of me for waaay too long, I think in part because it all seemed so official. At one place I had to go where they made me do questionnaires, the doc brought me into his office at one month of treatment. He said that my results indicated a 30% improvement. They were very confident and professional about their evaluations. They appeared as though they were “helping” and measuring my progress. I now see that clinic as just another place that harmed me bigtime. I’m still struggling to make sense of my time in the system, still coming off the drugs too, it’s like waking up from a really bad dream.
  
  Report comment
  
  Reply