“Data Pollution” Hinders Psychiatric Research

In JAMA Psychiatry, researchers argue that many studies are corrupted by data pollution and that the field is unable to manage these issues.


In a new article in JAMA Psychiatry, researchers suggest that “data pollution” impedes psychiatric research. They write that there are many aspects of data pollution, and experts in specific psychiatric research are ill-equipped to account for all of them.

“Neuropsychiatric research is substantially impeded by issues surrounding data collection and analysis. While these issues have been extensively discussed, their severe impact on neuropsychiatric effect sizes is not as widely recognized,” the researchers write.

The researchers were Alessandro S. De Nadai at Texas State University, Yueqin Hu at Beijing Normal University, and Wesley K. Thompson at the University of California, San Diego.

Businessman working in situation of air pollutionDe Nadai, Hu, and Thompson focus on data pollution, which they define as “inadvertent errors” in the data. This is distinct from “data poisoning,” which involves “intentional attempts to feed inaccurate data into models.” The current article focuses on well-intentioned researchers whose results are misleading by accident.

This is common, according to De Nadai, Hu, and Thompson. Moreover, they write that researchers in neuropsychiatry come from such varied backgrounds that none of them are experts in every potential form of data pollution and how to mitigate it.

For instance, data pollution can come from any of the following areas: “(1) unreliable measurement, (2) heterogeneous construct definition, (3) population mixtures with differing biopsychosocial mechanisms, (4) behavioral reporting bias by both patients and clinicians, (5) selection bias, and (6) data that are not missing at random.”

What these have in common is unreliability or “noise.” All of the tests and definitions in psychiatry have varying levels of subjectivity and are influenced by an almost infinite array of factors in a person’s life. Especially when a study uses multiple tests or attempts to account for moderation or mediation (whether certain factors are influenced by others), this noise can add up. In the end, the effects that researchers find are unreliable and often inflated.

“Inconsistent and inaccurate effect size estimation pollutes the research literature and makes it nearly impossible to build incrementally on small but important findings, which will be critical for future progress,” the authors explain.

They note that if physics research had the same level of unreliability, systems like GPS would be impossible to develop.

De Nadai, Hu, and Thompson also focus on the reliability and validity of psychiatric diagnoses. They note that even clinicians often disagree about whether a patient meets the criteria for a specific diagnosis, and patients often have a very different perspective. They add that diagnoses like depression and schizophrenia are extremely heterogeneous, lumping people together who have very different traits, feelings, and behaviors. This makes it very difficult to do research that might generalize to real-world patients.

The authors suggest that there are specific ways of accounting for the various types of data pollution and that researchers should have a “data pollution mitigation plan” before beginning their study.

“Without attending to data pollution,” they write, “much of our progress will be illusory, and true findings that improve patient welfare will remain undetected.”



De Nadai, A. S., Hu, Y., & Thompson, W. K. (2021). Data pollution in neuropsychiatry—an under-recognized but critical barrier to research progress. JAMA Psychiatry. Published online December 1, 2021. (Link)


  1. “They add that diagnoses like depression and schizophrenia are extremely heterogeneous, lumping people together who have very different traits, feelings, and behaviors. This makes it very difficult to do research that might generalize to real-world patients.”

    Part of why the DSM should be flushed.

  2. “De Nadai, Hu, and Thompson focus on data pollution, which they define as “inadvertent errors” in the data. This is distinct from “data poisoning,” which involves “intentional attempts to feed inaccurate data into models.” The current article focuses on well-intentioned researchers whose results are misleading by accident.”


    Where I read this is where I get mad. Or perhaps not quite as mad as in frustrated.

    Especially when I hear the words “well-intentioned.” Because I feel like saying “no, there IS ill intent.”

    Somewhere, at least. But I feel as if passivity and not caring IS ill intent or should be regarded so. Or people should be afraid enough of being accused of doing something intentionally so that they will be more than well meaning dopes but will think hard enough to avoid the “well intentioned error.”

    There is also such a thing as ill intent in education. Perhaps no one person having malicious intent, but everyone showing lack of responsibility and passing the buck. “So long as I get my money, I don’t care.”

    But, you know what? Sometimes you have an affirmative obligation to take responsibility. If your work is having an impact on people’s lives. And isn’t just about you climbing the ladder and having your career.

    Actually, quite often, researchers will just do anything to get published, because they need to get published in order to have their careers. And they become slaves to “what the media will publish.” And that comes first rather than the whole notion of doing good.

    And then they go have families and want to support them financially and all that. Well, ok, their children come first. They need to get published and make a splash so they can get promoted.

    And they were well-intentioned and thought the research data was sound, and as for selection bias or other problems, well they just hadn’t thought of that.

    Actually, how do you know someone is well intentioned? A lot of people are very good at coming across as well-intentioned, even while they are systematically doing all that is in their self interest. How do we know that they didn’t privately think of it in their own heads, some problem regarding selection bias that only could have been fixed by them getting a second grant to help them investigate or clarify certain issues? So they know FOR SURE?

    Well, it’s not going to make as much of a splash if researchers say “we did a study that suggests one thing — however, we realized, if you look at it this way and if you look at it that way, it’s possible the data might be ambiguous, and we can only find out for sure if we get more funds to do a more careful analysis.”

    I have a feeling their departments tell them “NO, don’t go there.” In other words, don’t ask those tough questions in the first place. Because there is a system and it’s like a machine and you have to work it in order to get ahead.

  3. Date pollution clouds research and horrible media coverge encourages people to say things like this commentator’s thoughts in reaction to another piece of crap barely one sided piece on psychiatry, this one found in the Seattle Times:


    Ghost of Hitler speaks: “This is why we need a robust involuntary commitment program where parents, teachers, and the medical community can have a person put into a treatment center. A good indication of mental illness is someone living on the street.”

  4. Peter, thanks for another great article! This study (hi Alex!) makes some great points that together help explain why decades of biological psychiatry research has accomplished essentially nothing of value “that might generalize to real-world patients.” But undoubtedly the same kind of research will continue with the same enthusiasm, unfulfilled promises that are never held accountable that we are on the cusp of a revolution that will transform everything, publications, grants, academic jobs, tenure and promotion, strong reputations, pharmaceutical company gifts, and so on.

    Why? Because this entire research enterprise is not and never has been about helping “real-world patients.” It is about acquiring resources for researchers. That is the point of a publication for a researcher – to pad the CV, get a job, get a grant, become a journal editor, get a book deal, get invited to present highly paid seminars, have people stare at you in awe at conferences, and so on. That is why researchers naively and intentionally engage in scientific sloppiness and misconduct. Trust me, I was trained to do so and accepted this fake world as a seductive and valid reality before a few years of real-world practice knocked some sense into me.

    Research is not about the patients. It’s about the researchers. And it’s not even really the researchers’ fault unless their work is deliberately fraudulent. It’s how the entire system is designed. It’s the natural consequence of incentives being followed.

    The sooner we can all understand this, the sooner we can collectively dismiss almost all of 40 years of biological psychiatry research (and that’s just for starters) into the dustbin of history and start over. But that won’t happen because the influential leaders needed to do this are all beholden to the incentives that drive shitty research. It’s a vicious cycle and it’s difficult to imagine a way out.

    There is a reason why wise practitioners ignore 99% of what counts for psychiatric research and it’s not that they are ignorant anti-science buffoons. It’s that the are switched on enough to see through the con. Because for them, it’s all about the patients.

  5. Good to see some honesty about the absolute mess that is psychiatric research. A steaming toxic dump of unintended consequences. Still somehow millions are being ploughed into genetic research that will eradicate madness from the Earth.

    The difficult part is coming to terms with your own personal take on madness, your own sub-routines which link in and branch off. And then locating your personal madness within a broad historical trend of cultural insanity. There are legitimate and sensible reasons why most people put a lot of energy and effort into avoiding that process.

    Fun also to see that naughty word “pollutant” being used metaphorically, while at the same time lots of good, legitimate science is revealing that a lot of mental illness might be better described as environmental illness or climate illness or diet illness, as unaccountable suffering and agony is no doubt being caused by the polluted air we breathe, the polluted soils and the polluted food chains and the polluted rains and the polluted rivers that flow into the polluted seas.

    But hey, let’s not be too pessimistic about all this. We’ve got 5 years to turn all this around. It simply requires science, positive thinking, setting personal vanity goals, and finding new and inventive ways to distract oneself from one’s self.

    It’s going to be okay because it has to be okay. Keep crunching the numbers, popping the pills or resisting the pills. At some point it will all somehow resolve itself, magically.

    It’s why psychiatry does so well, despite all the many shortcomings. It offers a straight face to magical thinking and the belief that the apocalyptic “soul” of human beings can be resolved, long-term, with careful thinking and sustained determined effort.

