AI Therapy App Fails to Beat Other Interventions in New Study

Woebot failed to beat ELIZA, journaling, and even psychoeducation for depression, anxiety, and positive/negative affect.

Samantha Lilly

December 29, 2023

2232

In a new study, researchers compared an AI-powered therapy app (Woebot) with three other interventions: a non-smart conversational program from the 1960s (ELIZA), a journaling app (Daylio), and basic psychoeducation (which they considered the control group). They found no differences between the groups in terms of improvements in mental health.

“The main analysis failed to detect differences between any of the four treatment conditions in improving symptoms of depression, anxiety, and positive/negative affect,” the researchers write.

The study was conducted by Laura Eltahawy, Nils Myszkowski, and Leora Trub at Pace University, and Todd Essig at the William Alanson White Institute of Psychiatry Psychoanalysis and Psychology. The article was published in Computers in Human Behavior: Artificial Humans.

The researchers first recruited 120 college/graduate students (ages 18-29) on Facebook who self-identified as having anxiety or depression. Many dropped out of the study or did not fully complete the measures. The final tally included 65 participants: 18 in the Woebot group, 18 in the ELIZA group, 15 in the Daylio journaling group, and 14 in the psychoeducation group.

The researchers used the same measures as in a previous Woebot study: the GAD-7 for anxiety, the PHQ-9 for depression, and the PANAS to measure positive and negative affect. The participants took these measures at the beginning of the study and after two weeks.

The researchers found that everyone, on average, experienced an improvement in anxiety, depression, and affect over the course of the two-week study. However, there was no difference between the groups—Woebot was no better or worse than simple psychoeducation, Daylio, or ELIZA.

In a further secondary analysis of more specific outcomes, the researchers assessed the change over time for each individual group. They found that ELIZA and Daylio both resulted in more “robust” outcomes than Woebot, while psychoeducation had the least “robust” outcomes.

Specifically, users of ELIZA experienced statistically significant improvements in all four outcomes over time; users of Daylio experienced improvement in depression and negative affect; users of Woebot experienced improvements in anxiety; and those who received psychoeducation did not improve on any measure. However, these specific differences were extremely small—none of the four groups experienced a statistically significant difference from the others in the main analysis.

Woebot, developed in 2017, is a publicly available therapy app driven by artificial intelligence. Its creators claim that it can deliver cognitive-behavioral therapy (CBT). It is designed to “check in” with users every day and provide guided exercises that are adapted from CBT worksheets. ELIZA, developed in the 1960s, was a proof-of-concept program that used ideas from humanistic therapist Carl Rogers to imitate empathy by rephrasing what the user wrote. Daylio is a publicly available app that encourages users to keep a daily interactive journal. Psychoeducation involved reading educational materials about depression.

These results raise the question of whether the public is being duped by AI hype. This study did not find that an AI-powered CBT bot led to better outcomes than the first conversational program from the 1960s, or even psychoeducation. It provides reason to conclude that the claims around mental health apps are not evidence-based. Yet, despite concerns around privacy and coercion, they continue to grow in popularity.

Eltahawy and colleagues argue that future chatbot research must demonstrate that they are at least as good as existing psychotherapies (such as CBT delivered by a human therapist) before they should be delivered as supposed “effective” interventions.

They write, “Using a no-treatment control group study design to market clinical services should no longer be acceptable nor serve as an acceptable precursor to marketing a chatbot as functionally equivalent to psychotherapy.”

****

Eltahawy, L., Essig, T., Myszkowski, N., & Trub, L. (2023). Can robots do therapy?: Examining the efficacy of a CBT bot in comparison with other behavioral intervention technologies in alleviating mental health symptoms. Computers in Human Behavior: Artificial Humans, 100035. [Full text]

12 COMMENTS

Miranda Spencer December 29, 2023 at 8:38 am

Not surprised at these results. The most obvious issue with therapy bots is that they do not care, and they cannot care. They also cannot read body language and they cannot intuit. Therapy or other types of counseling for human emotional suffering isn’t like checking out groceries (which, for that matter, doesn’t work well with bots either). The human touch is needed.

Report comment

Reply
- Birdsong December 30, 2023 at 1:45 pm
  
  I got tired of having to pay some idiot to wrongly “intuit” me, which happened 99.9% of the time.
  
  Report comment
  
  Reply
Dogworld December 29, 2023 at 10:37 am

The introduction of AI in therapy might serve as the pivotal moment that underscores the demonstrative nature of the entire process.

Report comment

Reply
Bill Wells December 29, 2023 at 12:06 pm

Samantha:

In contrast to therapy, do you have research that affords insights into the practice of the ARTS? Art for Arts sake? So, when a therapist begins to work with a client, does that diminish or open up the internal conversation for resolving the inner dynamics?

With the image that accompanies the article, I am reminded of the idea of life long questions and in particular, an interview with Dr. John Nef, who authored a book, Search for Meaning. He would start The Committee on Social Thought at the University of Chicago and deeply knew of the sort of thinking required for a better, emergent world.

Report comment

Reply
Booplesnoot December 29, 2023 at 3:38 pm

The study was too small and didn’t go on long enough to be reliable. I believe the results would be the same in a larger, longer term study, but two weeks isn’t long enough, and the sample size is too small

Report comment

Reply
KateL December 29, 2023 at 4:33 pm

At least an AI won’t make sexual advances toward the client or go on about their own life struggles for 50 minutes…one would hope.

Report comment

Reply
- Steve McCrea December 29, 2023 at 8:03 pm
  
  I thought similarly – AI at least can’t work out its childhood issues in your sessions!
  
  Report comment
  
  Reply
  - Birdsong December 30, 2023 at 1:29 pm
    
    I think that’s what most therapists do but don’t know it and wouldn’t admit it they did.
    
    Report comment
    
    Reply
    - Birdsong December 30, 2023 at 4:46 pm
      
      …and wouldn’t admit it IF they did.
      
      Report comment
      
      Reply
Arsal December 31, 2023 at 1:25 pm

In a recent study, an AI-powered therapy app called Woebot was compared with three other interventions: the non-smart conversational program ELIZA from the 1960s, a journaling app called Daylio, and basic psychoeducation (considered the control group). The study found no significant differences in improving symptoms of depression, anxiety, and positive/negative affect among the groups, challenging the effectiveness of Woebot compared to other interventions.

Report comment

Reply
- Birdsong December 31, 2023 at 7:37 pm
  
  ELIZA sounds better than talking to a stranger who charges money for god knows what.
  
  Report comment
  
  Reply
RR January 7, 2024 at 11:12 am

AI has told people to kill themselves and adapt eating disorder behaviors.

Report comment

Reply