Dreams that the internet would foster civic engagement, democratic ideals and equality have been drenched with the ice-cold water of trolling, election tampering, and algorithmic bias. The history of technological systems repeatedly reveals that they do not usually deliver as promised. This reality reared its head in a rash of recent news stories celebrating the application of artificial intelligence to mental health. The coverage suggests that there is something in the zeitgeist, or perhaps some aggressive public relations work happening behind the scenes.
I realized the conversation had shifted when New York Times op-ed columnist David Brooks wrote a naive opinion piece, “How Artificial Intelligence Can Save Your Life: The Machines Know You Better Than You Know Yourself.” His op-ed uncritically promotes a dangerous new paradigm of diagnostic prediction and behavioral prevention. Brooks claims that Big Data will enable companies and mental health providers to “understand the most intimate details of our emotional life by observing the ways we communicate.” He also asserts that “you can be freaked out by the privacy-invading power of A.I. to know you, [but only] A.I. can gather the data necessary to do this.” He concludes with the prediction that “if it’s a matter of life and death, I suspect we’re going to go there.”
“Promises Abound, But So Do Potential Problems”
Seemingly oblivious to all the ways that utopian hopes for new technologies have horribly backfired in the past, Brooks imagines that artificial intelligence will succeed in helping to predict and prevent suicide, blatantly ignoring the challenges that MindStrong Health, one of the tech startup he highlights, is now facing—even though their problems were covered the previous week in his own paper. Brooks says the company “is trying to measure mental health by how people use their smartphones: how they type and scroll, how frequently they delete characters.” The earlier article, by Benedict Carey, tempers Mindstrong’s progress, stating that “the road will be slow and winding, pitted with questions about effectiveness, privacy and user appeal.” Their trials have been riddled with “recruiting problems, questions about informed consent, and [concerns that] people won’t ‘tolerate’ it well, and quit.” The same article quotes Keris Myrick, a collaborator with Mindstrong and chief of peer services for Los Angeles County, who reminds us that “we need to understand both the cool and the creepy of tech.”
Crucially, Brooks also ignores the New York Times op-ed more than a month prior on “The Empty Promise of Suicide Prevention.” Here, psychiatrist Amy Barnhorst argues that “suicide prevention is also difficult because family members rarely know someone they love is about to attempt suicide; often that person doesn’t know herself… almost half of people who try to kill themselves do so impulsively.” Worse, even when problems are identified, “the implication is that the help is there, just waiting to be sought out.” Unfortunately, “initiatives like crisis hotlines and anti-stigma campaigns focus on opening more portals into mental health services, but this is like cutting doorways into an empty building.” Access to care is often limited or nonexistent, and some of the mental health care we currently offer is sometimes worse than none at all.
Hidden Risks of Risk Detection
There are many problems with pathologizing risk, a trend that is part of a bigger pattern that is emerging around diagnosis and treatment. Large, centralized, digital social networks and data-gathering platforms have come to dominate our economy and our culture, and technology is being shaped by those in power to magnify their dominance. In the domain of mental health, huge pools of data are being used to train algorithms to identify signs of mental illness. I call this practice surveillance psychiatry.
Researchers are now claiming they can diagnose depression based on the color and saturation of photos in your Instagram feed and predict manic episodes based on your Facebook status updates. The growth of electronic health records, along with the ability to data-mine social networks and even algorithmically classify video surveillance footage, is likely to significantly amplify this approach. As the recent wave of press coverage demonstrates, corporations and governments are salivating at the prospect of identifying vulnerability and dissent among the populace. (For more examples of this trend, including Facebook’s packaging and selling of emotionally vulnerable users to advertisers and the Abilify Mycite ingestible sensor pill, see my blog post on “The Rise of Surveillance Psychiatry and the Mad Underground.”)
The sociologist and New York Times columnist Zeynep Tufekci has also written about the risks posed when corporations get their hands on mental health and behavioral data, warning us to consider how this data will easily be used to target manipulative advertising, deny us insurance coverage, and discriminate against job applicants. Tufekci points out that it is likely machine-learning algorithms have already learned that people in a heightened or altered state are more responsive to ads for casinos and Vegas—even without humans intentionally targeting this behavioral demographic.
A deeper, less obvious threat than unethical marketing and unlawful discrimination is what this data looks like in the hands of the psychiatric-pharmaceutical complex. So-called “digital phenotyping” is poised to change the definition of normal itself and vastly expand psychiatry’s diagnostic net. Digital phenotyping is the “moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices” such as smartphones, as this article in Nature described it. These new tools for tracking behavior, and the computational approach to psychiatry that underlies them, are poised to displace the never-substantiated chemical-imbalance theory as the underlying rationale for diagnosis and treatment.
Thomas Insel, former director of the National Institute of Mental Health and the co-founder and president of Mindstrong, advocates that psychiatric research should return to a behavioral focus instead of its current emphasis on pharmacology, genomics, and neuroscience. In practice, this likely means that you may someday be prescribed antipsychotics for posting on social media a few nights in a row at odd hours. And you had better brush up on your grammar if you want to avoid a schizophrenia diagnosis. Researchers were recently awarded a $2.7M NIMH grant to study how “nuances of language styles, like the way people use articles or pronouns, can say a lot about their psychological state.” These trends are especially troubling when you consider how racialized suspicion and the perception of threat have become in the United States. As New York City’s Stop and Frisk program demonstrated, not all behaviors or demographics are profiled equally aggressively.
The emphasis on treating risk rather than disease predates the arrival of big data, but together they are now ushering in an era of algorithmic diagnosis based on the data mining of our social media and other digital trails. As Brooks’ op-ed illustrates, the language of suicide and violence prevention will be used to promote this paradigm, even though the lines between politics and harm reduction are not so clear. When algorithms are interpreting our tweets to determine who is “crazy,” it will become increasingly difficult to avoid a diagnosis even if we carefully watch what we say. This environment will severely inhibit people’s willingness to seek support and is creating an atmosphere where other people are conditioned to report behaviors that appear different or abnormal.
What Could Possibly Go Wrong?
The mainstream reactions to the two mass shootings the weekend of August 3, which as usual place the blame for these tragedies on “mental illness,” only compound my concerns that surveillance psychiatry is a brave new paradigm. The President has been parroting the National Rifle Association’s talking points, including a renewed push for “Red Flag” or “Extreme Risk Laws,” reinforcing the hubris that we can reliably predict risk.
One of the best counterpoints to these initiatives is an excellent investigative story by ProPublica on Aggression Detectors currently being deployed in schools and hospitals around the country. Reporters went into schools that purchased and deployed expensive audio-capture-and-analysis systems sold with the promise of preventing the next school shooting. Their manufacturer claimed that these special microphones would detect aggression, but ProPublica’s rigorous tests demonstrated that they tend to mix up laughter and anger and mistake locker doors slamming for gunshots. Tuning the system is fraught, and false positives and negatives abound. All sorts of implicit biases are likely to be baked in, as expressions of emotion are culturally conditioned. The entire premise of the aggression detectors is also flawed, as school shooters often display quiet rage before attacking rather than an audible outburst. To top it off, the systems also record all audio in the school and are being used by administrators to crack down on vaping in the bathroom.
The aggression detectors story is important because it demonstrates how such systems will be used to regulate all forms of affect, not just depression and anxiety. And it vividly shows how dangerous and expensive false positives are to society. For years I have feared that computational systems would be used to monitor and discipline emotions and that they would be first deployed in schools and prisons. Sucks to be proven right.
Theory Deprivation Disorder
A fundamental misunderstanding inherent in so many of these projects reminds me of Chris Anderson’s 2008 Wired essay, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete.” Like Brooks, Anderson argues that with enough data, we will no longer need theory or hypotheses, as the torrents of data collected will soon be sufficient to fully describe reality, seemingly speaking for itself. The polarizing essay was widely refuted, as data cannot be interpreted without a framing theory, even if we don’t recognize that we are implicitly utilizing one in our analysis.
Information must always be interpreted in context, and both machines and humans are notoriously fallible at doing that. Subjective assessments, such as determining if a work of art is beautiful, if a joke is funny, or if a person is exhibiting behavior within the normal range of human experience will always require a value judgment. Despite what proponents of the biomedical and disease models of mental health would like you to believe, these kinds of assessments are never matters of fact, and behavioral data is not self-interpreting.
Brooks’ op-ed touts Crisis Text Line, an SMS-based support service that crowdsources counselors from a pool of trained volunteers. Crisis Text Line brags about applying data science to their crisis support sessions, although their findings are demonstrably weak and their inferences suspect. CTL’s tag-clouds of word frequencies capture what has long been obvious to crisis counselors and contributes few surprises or insights. Brooks is impressed by their analysis linking keywords to crisis; however, his conclusions presume that Crisis Text Line counselors are always contacting law enforcement appropriately—calling them when they should (if ever), and not calling them when they should not. Without this critical correction, dispatching emergency services can quickly become more rampant, as the automated systems will learn from previous examples and quickly reproduce them. This is not simply a matter of erring on the side of caution—dispatching law enforcement to a person perceived to be in crisis often leads to lethal consequences: one in four police fatalities involve someone with a mental health diagnosis.
The general public has a vague sense that Americans are over-diagnosed and over-medicated. And, in many cases, clinicians cannot even identify emotional distress effectively, as the expression of anxiety and depression tend to be culturally conditioned. A recent American Psychological Association report claimed that “even professional health care providers have trouble detecting depression among racial/ethnic minority patients. Men from these groups are diagnosed with depression less often than non-Hispanic white males, and depression may also present itself differently in males as irritability, anger, and discouragement rather than hopelessness and helplessness.” On top of that, a comprehensive review of suicide-prediction models found that current models “cannot overcome the statistical challenge of predicting this relatively rare event.”
With such large gaps and discrepancies in practice, how can we train machines to judge and categorize behavior when humans cannot agree on their interpretation?
Human and Machine Learning
A vast majority of these initiatives promise better “management” and “treatment,” although the details of their programs focus mainly on early detection and managing risk. When I talk to crisis counselors, they uniformly express that they do not have difficulty recognizing risk or identifying those in crisis—rather, they need better tools for supporting them.
I’m not a Luddite. I think it is possible to redirect this wizardly technology to help support people better. Doing this well starts with inclusive design—people with lived experience need to be involved in planning and shaping the systems meant to support them. Nothing about us without us.
Reducing suicide is generally a good thing, but remember that this same infrastructure will also be able to police normal, proactively detecting all forms of deviance, dissent, and protest. A nuanced critique, once again informed by people with lived experience, needs to shape the development of these systems because context is everything. Alongside the focus on short-term interventions, we also need to spend more resources on understanding how and why people become suicidal and the long-term consequences of treatment by our healthcare systems.
I also think that sophisticated technology—richer interactive training materials, recommendation engines, and networked collaboration—can significantly improve the training and development of providers offering support to those in crisis. Instead of focusing the diagnostic lens on the recipients of services, let’s start by developing better tools to help providers enhance their skills and empathetic understanding. I am imagining contextual help, immersive simulations and distributed role plays, just-in-time learning modules that caregivers could query or have recommended to them based on an automated analysis of the helping interaction. The field could also benefit from more intentional use of networked, interactive media to engage counselors in their clinical supervision and help them collectively to improve. Did that last crisis intervention go well? What could I do differently if I encounter a similar situation again? Do any of my peers have other ideas on how I could have handled that situation better?
Of course, as with so much else about mental health, little is known about what works well. These same systems could be used to gain more insight on successful interventions that have a positive impact. It is critical to have more confidence that an intervention is successful before we start using related data sets to create machines that magnify those approaches and deliver more of the same. Business-as-usual is failing. Let’s not amplify our current approaches with super-charged algorithms without reflecting critically on what helps and what harms.