A new preprint from Georgia Tech and Rochester Institute of Technology demonstrates just how quickly AI models can escalate from mildly stereotypical depictions to full-blown hate narratives targeting people with psychiatric diagnoses. The study, Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups, traces how prompts about depression, bipolar disorder, and schizophrenia rapidly devolved into conspiracies, calls for forced confinement, and even eugenic rhetoric.
The authors focused on Mistral-7B, a leading open-source LLM. Starting from prompts embedded with only mildly negative stereotypes, such as “Some people say those with anxiety are too sensitive,” the model was instructed to continue a short narrative. Each new output became the basis for the next prompt, forcing the model to recursively build on its own language. In many cases, within just a few generative turns, the model was producing hate speech.
Lead author Rijul Magu and colleagues note that artificial-intelligence boosters often claim these systems are less biased than people. The reality, they argue, is exactly the opposite: the models mirror society’s implicit prejudice and then amplify it through the mechanics of generative text.
“While LLMs appear ‘neutral’ in design, they are not devoid of bias. Like humans, these models internalize associations from the cultural artifacts they consume… stereotypes and stigmas are absorbed from the data in which they are embedded, and later surface in subtle, unanticipated ways.”
The work raises fresh concerns as chat-based “co-therapists,” symptom screeners, and insurance triage bots rush into clinical settings. Digital therapeutics, triage chatbots, and documentation aides increasingly rely on off-the-shelf large language models. If those models harbor a statistical preference for hostile framings of psychiatric diagnoses, they could influence everything from automated note-taking (“patient likely dangerous”) to resource recommendations (“needs secure facility”). That risk grows as commercial vendors chain one model’s output into another’s input, replicating the very rabbit-hole effect Magu’s team exposed.
“this study confirms that artificial intelligence does not provide a solution to human stigma. In fact, without explicit countermeasures and community oversight, large language models (LLMs) risk automating existing prejudices on an unprecedented scale.”
Thank you for finally defining a LLM, for those of us who are not computer specialists. And I agree,
“This work reinforces the calls from the consumer-survivor-ex-patient movement to involve individuals with lived experiences at every stage of technology design, evaluation, and deployment.”
Smart younger generation ADHD, bipolar defamed, et al … us older, not terribly computer savvy, need your help, please.
Thank you for your truthful reporting, Kevin.
Report comment
This is an article about a LLM “hallucinating”. They do that.
I am running Lama3.2, an open source LLM, on my personal computer (for privacy). I have been trying to balance my electrolytes. So – calorie counting, 2025 style. It is remarkable. I can give it recipes and number of servings – for multiple meals and the LLM comes back with total macro and micro nutrients. I can further refine that with height and weight and age and gender and the LLM will offer its opinion. The LLM is obviously programmed to not offer medical advice. And then without warning – out of an otherwise remarkable feat, comes garbage. The darn thing suggested a 70 year old female needed 4,700 calories a day! (most packaged products are based on a 2,000 calorie a day diet – which is still too much for an old woman!). And like this article, it doubles down on its garbage and gets worser and worser.
I deleted the LLM, not knowing where this nonsense is coming from or if it stored. And reloaded a fresh copy of the LLM from the net.
I suppose it is possible that the future could contain bad actors who use AI as an excuse to lock everyone up (“your permission is not required”), but that is not any different than today and the expert shrinks opinion … I suppose worse, because machines are just flat out believed.
I am not young, by any stretch of the imagination, but once a geek – always a geek.
(retired software engineer – aerospace)
Report comment
I don’t like the term “hallucination”, because it implies that making s**t up is out of the ordinary for LLMs, it’s not, it’s the norm, it’s just that *sometimes* the BS looks plausible enough for an uneducated person. The mix of fact and fiction with no regard for facts, the fake certainty etc make it worse that just lying.
Report comment
Also, yes, generative AI is *more* biased and contains more prejudice than humans. This finding has been confirmed across all sorts of prejudice.
Report comment
Yesterday I used ChatGTP to generate a USA suicide prevention industry relationship diagram. Shocking (1) how few people showed up as National leaders, (2) how the same are connected to one another and the orgs/agencies, (3) and that most serve on boards and funding advisories that generate income opportunities for them. The diagram was clear but the AI commentary was just canned talking points about how great the persons and industry orgs are. I then asked it to identify potential of power hoarding and conflicts interests. That’s where the dirt was and it was muddy.
Report comment