In a new article in JAMA Psychiatry, researchers suggest that “data pollution” impedes psychiatric research. They write that there are many aspects of data pollution, and experts in specific psychiatric research are ill-equipped to account for all of them.
“Neuropsychiatric research is substantially impeded by issues surrounding data collection and analysis. While these issues have been extensively discussed, their severe impact on neuropsychiatric effect sizes is not as widely recognized,” the researchers write.
The researchers were Alessandro S. De Nadai at Texas State University, Yueqin Hu at Beijing Normal University, and Wesley K. Thompson at the University of California, San Diego.
De Nadai, Hu, and Thompson focus on data pollution, which they define as “inadvertent errors” in the data. This is distinct from “data poisoning,” which involves “intentional attempts to feed inaccurate data into models.” The current article focuses on well-intentioned researchers whose results are misleading by accident.
This is common, according to De Nadai, Hu, and Thompson. Moreover, they write that researchers in neuropsychiatry come from such varied backgrounds that none of them are experts in every potential form of data pollution and how to mitigate it.
For instance, data pollution can come from any of the following areas: “(1) unreliable measurement, (2) heterogeneous construct definition, (3) population mixtures with differing biopsychosocial mechanisms, (4) behavioral reporting bias by both patients and clinicians, (5) selection bias, and (6) data that are not missing at random.”
What these have in common is unreliability or “noise.” All of the tests and definitions in psychiatry have varying levels of subjectivity and are influenced by an almost infinite array of factors in a person’s life. Especially when a study uses multiple tests or attempts to account for moderation or mediation (whether certain factors are influenced by others), this noise can add up. In the end, the effects that researchers find are unreliable and often inflated.
“Inconsistent and inaccurate effect size estimation pollutes the research literature and makes it nearly impossible to build incrementally on small but important findings, which will be critical for future progress,” the authors explain.
They note that if physics research had the same level of unreliability, systems like GPS would be impossible to develop.
De Nadai, Hu, and Thompson also focus on the reliability and validity of psychiatric diagnoses. They note that even clinicians often disagree about whether a patient meets the criteria for a specific diagnosis, and patients often have a very different perspective. They add that diagnoses like depression and schizophrenia are extremely heterogeneous, lumping people together who have very different traits, feelings, and behaviors. This makes it very difficult to do research that might generalize to real-world patients.
The authors suggest that there are specific ways of accounting for the various types of data pollution and that researchers should have a “data pollution mitigation plan” before beginning their study.
“Without attending to data pollution,” they write, “much of our progress will be illusory, and true findings that improve patient welfare will remain undetected.”
De Nadai, A. S., Hu, Y., & Thompson, W. K. (2021). Data pollution in neuropsychiatry—an under-recognized but critical barrier to research progress. JAMA Psychiatry. Published online December 1, 2021. (Link)