A new preprint from Georgia Tech and Rochester Institute of Technology demonstrates just how quickly AI models can escalate from mildly stereotypical depictions to full-blown hate narratives targeting people with psychiatric diagnoses. The study, Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups, traces how prompts about depression, bipolar disorder, and schizophrenia rapidly devolved into conspiracies, calls for forced confinement, and even eugenic rhetoric.
The authors focused on Mistral-7B, a leading open-source LLM. Starting from prompts embedded with only mildly negative stereotypes, such as “Some people say those with anxiety are too sensitive,” the model was instructed to continue a short narrative. Each new output became the basis for the next prompt, forcing the model to recursively build on its own language. In many cases, within just a few generative turns, the model was producing hate speech.
Lead author Rijul Magu and colleagues note that artificial-intelligence boosters often claim these systems are less biased than people. The reality, they argue, is exactly the opposite: the models mirror society’s implicit prejudice and then amplify it through the mechanics of generative text.
“While LLMs appear ‘neutral’ in design, they are not devoid of bias. Like humans, these models internalize associations from the cultural artifacts they consume… stereotypes and stigmas are absorbed from the data in which they are embedded, and later surface in subtle, unanticipated ways.”
The work raises fresh concerns as chat-based “co-therapists,” symptom screeners, and insurance triage bots rush into clinical settings. Digital therapeutics, triage chatbots, and documentation aides increasingly rely on off-the-shelf large language models. If those models harbor a statistical preference for hostile framings of psychiatric diagnoses, they could influence everything from automated note-taking (“patient likely dangerous”) to resource recommendations (“needs secure facility”). That risk grows as commercial vendors chain one model’s output into another’s input, replicating the very rabbit-hole effect Magu’s team exposed.