Using AI to Find Vocal Biomarkers of ‘Mental Illness’ is Likely to Deepen Bias

Efforts to improve screening of vocal biomarkers through technology may deepen rather than mitigate bias in psychiatric diagnosis.

Emaline Friedman, PhD

Research at the MIT Language and Technology lab assesses nascent AI-enabled pattern recognition of vocal biomarkers for ‘mental illness’ for systematic bias. MIT anthropologist Beth Semel participated in this research team in an effort to understand how voice analysis technologies sustain the U.S. mental health care system’s logic of capture and containment, and how this disproportionately harms marginalized groups and non-U.S. citizens.

Semel writes that these efforts to use technology to diagnose psychiatric disorders, based on the “tangled associations” of tone and inflection, threaten to promote a new “phrenology of the throat.” She adds:

“While the suggestion that AI might make these choices easier is seductive, the historical and ethnographic record demonstrates that automation in the name of efficacy tends to deepen, not mitigate, inequities that fall upon racialized, gendered, and colonial fault lines.”
Creative Commons

Biological indicators of mental illness are a long sought after “final word” on diagnoses, promising to rid the mental health professions of the uncertainty and judgment calls around the application of diagnostic categories to patients.  This striving has led to recent efforts such as the Research Domain Criteria (RDoc) project, funded by the NIMH, that attempts to locate specific brain processes that produce dysfunction. In another example, mental health app developers are trying to ground psychiatric diagnoses in people’s patterns of online behavior.

Computational psychiatry resides on the edges of mental health care research. In efforts to find alternatives to the hypothesis-driven methods of diagnosis used by DSM adherents, these psychiatrists favor a data-driven approach more commonly found in computer science and engineering.

Computational psychiatric researchers believe that it is only a matter of time until enough observable data is collected on patients to find biological, etiological “keys” to veracious diagnoses. Such data may bear no relationship to conventional diagnostic criteria. The idea is simply to find new patterns between behavior and the onset of ‘mental illness.’

The psychiatric researchers whose laboratory Semel observed treat speech as if it were another bio-behavioral indicator, like gait or response time to a stimulus. In the lab, engineers trained in signal processing are hard at work amassing speech data and applying mathematical analyses to glean information about the neuronal sources of spoken language.

“In theory, because vocal biomarkers index the faulty neural circuitry of mental illness, they are agnostic to language difference, speaker intentionality, and semantic, sociocultural meaning. Neurobiological essentialism and language universalism collide,” writes Semel.

Through fieldwork in the laboratories of computational psychiatric researchers, Semel examined the possible unintended effects of automating psychiatric screening. She notes that automating screening sounds like an effective way to democratize access to treatment but may actually deepen systemic biases that disproportionately harm members of marginalized groups in the screening process.

Semel’s ethnographic research suggests that the decision-making involved in creating automated psychiatric screenings risks replicating the same inequities of non-automated screenings.

The embodied, interactional dimensions of listening are central to automating psychiatric screening. Voice analysis systems are carefully developed from hand-labeled talk data and brain scans of recruited participants. Semel notes that the interactional settings and sociocultural scripts used to collect such data are never neutral decisions. Yet, they will be used to generate algorithmic recognition of speech patterns across potential patients – especially the underprivileged whom automation is supposed to help gain access to mental health treatment.

This ethnographic research reflects the continued existence of the computational logic underlying scientific racism. Semel’s work issues a warning to institutions and practitioners eager to jump aboard the latest intake automation technology. It is meant to invite critical historical, political, and economic excavations of the classification schemes taken for granted in algorithmic systems applied to mental health.



Semel, Beth. (2020). The Body Audible: From Vocal Biomarkers to a Phrenology of the Throat. Retrieved September 24, 2020, from Somatosphere Web site:


    • bcharris,
      it seems first they label a voice as being “mentally ill” and design the program to fit that mold.
      Interesting how the design would be human made, and then have the audacity to name it AI.

      I consider that they should call it SI for “superficial intelligence”
      Or just plain S, for “stupidity”

  1. Did these researchers even take into account code-switching? I just honestly can’t see how this would ever work because our voices are so variable.

    For example, anyone who’s worked with customer service knows that you put up this false front and an overly friendly voice. What’s to stop people from just doing something like that during their test?

    Or what if English isn’t a person’s native language? Tone differs wildly from one language to another and would most certainly color the test.

    • It’s the Biomarkers that is really the catchword.
      It’s the prop.
      People tend not to discern the word “intelligence”. If they
      are impressed by something or someone, they consider that the marker
      of “intelligence”.

      And don’t forget that “researchers” are in the industry to help prop it up.
      As psychiatry sits in the boardroom, the “researchers” look for what the 30 mere people
      invented. The high priests reside in the boardroom, ALL lying to each other.
      It’s amazing really.

      You realize that ALL organizations work this way, to lie to each other about the thing to hit the market.

  2. Are these people freakin’ serious? What about a voice pattern could POSSIBLY be considered a “biomarker” for anything? I suppose that they will discover that “depressed” people speak in a flatter and less variable tone. Or we could just ASK the person how s/he is feeling instead of using all this technology to analyze their voices? How does this kind of idiocy pass for science?

  3. I highly recommend “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy” by data analyst Cathy O’Neil.

    Screening technology is only as good as its programming, which is socially biased and, therefore, self-perpetuating. This was proved with criminal risk assessment algorithms (computer formulas) that claim to predict criminality. Because Black people are disproportionately incarcerated, it is falsely assumed that they are more criminally minded when, in reality, they’re disproportionately incarcerated because of systemic racism.

    Screening technology to detect ‘mental illness’ would also be skewed by social bias and would also confirm and strengthen that bias.

    Recognition technology cannot solve social problems because it targets the victims, not the social conditions that create their suffering.