AI Therapy App Fails to Beat Other Interventions in New Study

Woebot failed to beat ELIZA, journaling, and even psychoeducation for depression, anxiety, and positive/negative affect.


In a new study, researchers compared an AI-powered therapy app (Woebot) with three other interventions: a non-smart conversational program from the 1960s (ELIZA), a journaling app (Daylio), and basic psychoeducation (which they considered the control group). They found no differences between the groups in terms of improvements in mental health.

“The main analysis failed to detect differences between any of the four treatment conditions in improving symptoms of depression, anxiety, and positive/negative affect,” the researchers write.

The study was conducted by Laura Eltahawy, Nils Myszkowski, and Leora Trub at Pace University, and Todd Essig at the William Alanson White Institute of Psychiatry Psychoanalysis and Psychology. The article was published in Computers in Human Behavior: Artificial Humans.

A hand holding a cell phone. Cubes marked with red question marks emerge from the phone

The researchers first recruited 120 college/graduate students (ages 18-29) on Facebook who self-identified as having anxiety or depression. Many dropped out of the study or did not fully complete the measures. The final tally included 65 participants: 18 in the Woebot group, 18 in the ELIZA group, 15 in the Daylio journaling group, and 14 in the psychoeducation group.

The researchers used the same measures as in a previous Woebot study: the GAD-7 for anxiety, the PHQ-9 for depression, and the PANAS to measure positive and negative affect. The participants took these measures at the beginning of the study and after two weeks.

The researchers found that everyone, on average, experienced an improvement in anxiety, depression, and affect over the course of the two-week study. However, there was no difference between the groups—Woebot was no better or worse than simple psychoeducation, Daylio, or ELIZA.

In a further secondary analysis of more specific outcomes, the researchers assessed the change over time for each individual group. They found that ELIZA and Daylio both resulted in more “robust” outcomes than Woebot, while psychoeducation had the least “robust” outcomes.

Specifically, users of ELIZA experienced statistically significant improvements in all four outcomes over time; users of Daylio experienced improvement in depression and negative affect; users of Woebot experienced improvements in anxiety; and those who received psychoeducation did not improve on any measure. However, these specific differences were extremely small—none of the four groups experienced a statistically significant difference from the others in the main analysis.

Woebot, developed in 2017, is a publicly available therapy app driven by artificial intelligence. Its creators claim that it can deliver cognitive-behavioral therapy (CBT). It is designed to “check in” with users every day and provide guided exercises that are adapted from CBT worksheets. ELIZA, developed in the 1960s, was a proof-of-concept program that used ideas from humanistic therapist Carl Rogers to imitate empathy by rephrasing what the user wrote. Daylio is a publicly available app that encourages users to keep a daily interactive journal. Psychoeducation involved reading educational materials about depression.

These results raise the question of whether the public is being duped by AI hype. This study did not find that an AI-powered CBT bot led to better outcomes than the first conversational program from the 1960s, or even psychoeducation. It provides reason to conclude that the claims around mental health apps are not evidence-based. Yet, despite concerns around privacy and coercion, they continue to grow in popularity.

Eltahawy and colleagues argue that future chatbot research must demonstrate that they are at least as good as existing psychotherapies (such as CBT delivered by a human therapist) before they should be delivered as supposed “effective” interventions.

They write, “Using a no-treatment control group study design to market clinical services should no longer be acceptable nor serve as an acceptable precursor to marketing a chatbot as functionally equivalent to psychotherapy.”



Eltahawy, L., Essig, T., Myszkowski, N., & Trub, L. (2023). Can robots do therapy?: Examining the efficacy of a CBT bot in comparison with other behavioral intervention technologies in alleviating mental health symptoms. Computers in Human Behavior: Artificial Humans, 100035. [Full text]


  1. Not surprised at these results. The most obvious issue with therapy bots is that they do not care, and they cannot care. They also cannot read body language and they cannot intuit. Therapy or other types of counseling for human emotional suffering isn’t like checking out groceries (which, for that matter, doesn’t work well with bots either). The human touch is needed.

    Report comment

  2. Samantha:

    In contrast to therapy, do you have research that affords insights into the practice of the ARTS? Art for Arts sake? So, when a therapist begins to work with a client, does that diminish or open up the internal conversation for resolving the inner dynamics?

    With the image that accompanies the article, I am reminded of the idea of life long questions and in particular, an interview with Dr. John Nef, who authored a book, Search for Meaning. He would start The Committee on Social Thought at the University of Chicago and deeply knew of the sort of thinking required for a better, emergent world.

    Report comment

  3. In a recent study, an AI-powered therapy app called Woebot was compared with three other interventions: the non-smart conversational program ELIZA from the 1960s, a journaling app called Daylio, and basic psychoeducation (considered the control group). The study found no significant differences in improving symptoms of depression, anxiety, and positive/negative affect among the groups, challenging the effectiveness of Woebot compared to other interventions.

    Report comment