Bias and Deception in Behavioral Research

Jay Joseph, PsyD
12
1055

In my 2017 e-book Schizophrenia and Genetics: The End of an Illusion (and in previous publications), I showed that the famous Danish-American schizophrenia adoption studies of the 1960s-1990s were environmentally confounded, methodologically flawed, and genetically biased to an extreme degree, and therefore provide no scientifically acceptable evidence in favor of genetic influences on schizophrenia.1 Combined with the questionable validity of the “schizophrenia” concept, the faulty assumptions underlying psychiatric twin studies, and the failure to identify causative genes, a thorough reevaluation of the “genetics of schizophrenia” debacle is long overdue.

While I was working on the book, cognitive neuroscientist Chris Chambers of Cardiff University published The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice.2 In this book, Chambers pointed to several problem areas in the research/publication process in psychology and other fields. These include the “deadly sins” of “bias,” “hidden flexibility,” “unreliability,” “data hoarding,” “corruptibility” (fraud), and “bean counting” (funding and publication issues). Although many other problem areas in social and behavioral science research were not covered in this book, Chambers provided a valuable framework for describing biased, deceptive, and even fraudulent research practices.

Although Chambers focused on research in psychology, his message is clearly relevant to most other areas of research, including psychiatric and behavioral research, as well as drug safety and effectiveness trials. Contrary to popular belief, science is not immune to the corrupting influences of the society it operates in. On an individual level, research findings and conclusions are influenced by confirmation bias, which is the tendency for people to search for, interpret, favor, and recall information in a way that confirms their preexisting beliefs or theories.

Like other areas of research, psychiatric investigators claim statistically significant findings after producing results that fall below the conventional .05 level of statistical significance. This means that there was less than a 5% probability that the finding occurred by chance. Larger sample sizes increase the likelihood that differences between groups will reach statistical significance; smaller samples have the opposite effect. In scientific research, a probability value (“p-value”) below the .05 threshold is the researchers’ make-or-break gold standard, enabling them to conclude that they found statistically significant results.

In normal experiments based on the standard “hypothetico-deductive” (H-D) scientific method, in sequence researchers generate hypotheses, design a study, collect data, analyze data and test hypotheses, interpret data and determine statistical significance, and submit their findings for publication. In the process they perform “null hypothesis significance testing” (NHST). The “null hypothesis” is a default position which states that there is no difference between the specified populations under study, and that any observed differences are due to chance, or to experimental error. In schizophrenia adoption research, for example, the null hypothesis states that there is no difference in schizophrenia diagnoses between the schizophrenia experimental group versus the control group, meaning that genetic factors play no role in causing the condition. If researchers find group comparisons below the .05 threshold in the genetic direction, they reject the null hypothesis and conclude that hereditary factors are responsible for the group differences.

Researchers are expected to formulate their hypotheses before they obtain their data. After they collect, review, and analyze the data, they determine whether their results point to the acceptance or rejection of these hypotheses. Although a “cardinal rule in experimental design” is “that any decision regarding the treatment of data must be made prior to an inspection of the data,” in behavioral research as currently practiced it is difficult to verify this.3

P-Hacking, HARKing, and Data Dredging

P-Hacking. P-hacking is the practice of consciously or unconsciously manipulating data to produce results that fall below the .05 level of statistical significance. Researchers have “degrees of freedom” that allow them the “hidden flexibility” to change various aspects of their study after reviewing the data, but before submitting their paper for publication and peer review.4 As Chambers defined it, p-hacking is “exploiting researcher degrees of freedom to generate statistical significance.” A “key feature” of researchers’ decisions “is that they are hidden and never published.”5 P-hacking occurs, as a group assessing its impact put it, “when researchers collect or select data or statistical analyses until nonsignificant results become significant.”6 

Some ways that researchers can p-hack data include (1) conducting analyses midway through experiments to decide whether to continue collecting data (“peeking” at data), and stopping the collection of data if an analysis yields a statistically significant p-value; (2) recording many response variables and deciding which to report after the fact; (3) deciding after the fact whether to include or remove outliers; (4) excluding, combining, or splitting treatment groups after the fact; and (5) continuing to collect data past the planned stop point if significant comparisons are not found.7 Because social and behavioral science researchers have the hidden flexibility to change definitions and methods without else anyone knowing, as Chambers noted they are able to decide when to stop counting participants (subjects), and are able to redefine the condition or characteristic they are studying. This enables researchers to “navigate either deliberately or unconsciously in order to generate statistically significant effects.”8 Surveys suggest that “questionable research practices” are common in psychology, and occur in part because there are many built-in incentives and pressures in academic research to p-hack, but few safeguards in place to prevent it.

HARKing. The term “HARKing” was introduced by psychologist Norbert Kerr in 1998, and stands for “hypothesizing after the results are known.”9 Kerr defined HARKing “as presenting a post hoc hypothesis in the introduction of a research report as if it were an a priori hypothesis.”10 In other words HARKing occurs when, after researchers inspect their data, they create a new hypothesis which they claim or imply was created before they inspected their data. In Chambers’ words, “HARKing is a form of academic deception in which the experimental hypothesis (H1) of a study is altered after analyzing the data in order to pretend that the authors predicted results that, in reality, were unexpected.” This method produces the “clean and confirmatory papers that psychology journals prefer while also maintaining the illusion that the research is hypothesis driven and thus consistent with the H-D method.” Chambers concluded that “deliberate HARKing . . . lie on the same continuum of malpractice as research fraud.”11 Again, there are few safeguards in place to prevent HARKing. The peer-review process in science, which usually takes place after a paper is submitted for publication, is not equipped to detect HARKing or p-hacking, even if peer reviewers wish to do so.

Data Dredging. Another unsound research practice is “data dredging” (also known as a “fishing expedition”), which involves investigators searching through data in an attempt to find statistically significant trends or differences, without testing a prior hypothesis. Identifying correlations and potential factors can be useful to help arrive at a hypothesis, but that hypothesis must then be tested on a different set of data. As the authors of a medical textbook emphasized, a hypothesis cannot be developed and tested in the same study. If this happens, data dredging has occurred:

“The scientific process requires that hypothesis development and hypothesis testing be based on different data sets. One data set is used to develop the hypothesis or model, which is used to make predictions, which are then tested on a new data set.”12 [italics in original]

Data dredging is related to the “Texas sharpshooter’s fallacy,” which describes a sharpshooter who fires his gun at the side of a barn, and later draws targets around a cluster of points that were hit. Although people viewing the barn might think that he hit his targets, the sharpshooter drew these targets after he fired his gun. According to Wikipedia, this “fallacy is characterized by a lack of a specific hypothesis prior to the gathering of data, or the formulation of a hypothesis only after data have already been gathered and examined.” It is a fallacy in part because, in a large dataset based on multiple comparisons, we would expect to find statistically significant correlations by chance alone.

Although data dredging is a form of p-hacking, researchers can select statistically significant results or comparisons after the fact without manipulating their data to do so. Data dredging also differs from HARKing because, although researchers are pointing to comparisons that they did not plan to make or highlight, they are not necessarily claiming that they are testing a prior hypothesis.

The Urgent Need for the Preregistration of Research in the Social and Behavioral Sciences

P-hacking, HARKing, and data dredging are methods that some researchers use to achieve statistically significant results even though the null hypothesis may in fact be true, thereby misleading science and the public. There are a number of possible motivations for doing this. Scientific researchers are under pressure to produce statistically significant findings in order to get their studies published in prestigious journals, which might tempt them to use their “degrees of freedom” to produce results that these journals will publish. Other possible motivations include financial motives, a desire to achieve career advancement and prestige, the need to obtain research funding (grants), supporting their field against critics, helping the companies they work for increase profits, and ideological motives. Genetic (biological) determinism is an ideology, although its adherents usually deny this and claim that their beliefs are based on nothing more than objective scientific evidence.

Building on calls by previous authors going back to the 1960s, which includes my own 2000 proposal co-authored by the late psychologist Steve Baldwin, Chambers called for the establishment of psychology research “preregistration,” where investigators would be required to submit an introduction, and their proposed methods, definitions, and analyses, before they collect their data.13 As Chambers described it:

“The essence of preregistration is that the study rationale, hypotheses, experimental methods, and analysis plan are stated publically in advance of collecting data. . . . Since authors will have stated their hypotheses in advance, preregistration prevents HARKing and ensures adherence to the H-D [normal] model of the scientific method. . . . Preregistration also prevents researchers from cherry-picking results that they believe generate a desirable narrative.”14

The preregistration of research would greatly reduce p-hacking, HARKing, data dredging, and other deceptive methods. Fortunately, a movement is now underway to make preregistration the norm in the social and behavioral sciences. Although “we may never be able to eliminate bias altogether from human nature,” Chambers wrote, a “sure way to immunize ourselves against its consequences . . . is peer-reviewed study preregistration.”15 And yet, it is likely that people and institutions with a vested interest in maintaining the current system will oppose research preregistration, and if implemented might attempt to work around it.

In Schizophrenia and Genetics I reviewed the widely cited Danish-American adoption studies in depth, and examined how the researchers arrived at their conclusions. In addition to the major problems found in psychiatric adoption studies in general,16 I pointed to several instances where the Danish-American researchers clearly or likely resorted to p-hacking, HARKing, or data dredging in order to arrive at conclusions in favor of genetics. When false results produced by p-hacked research have social, scientific, and political importance, and affect or harm the lives of millions of people while entire fields look on, it constitutes a scientific scandal.

* * *

My 20 years of analyzing genetic research in the social and behavioral sciences leads me to conclude that the practices described by Chambers and others are common, and have contributed to the acceptance of false conclusions about the role of genetic influences on psychiatric disorders and other behavioral characteristics (IQ, personality, criminality, and so on). These practices may also be occurring in psychiatric drug trials. Chris Chambers has performed a valuable service to science and society by helping us better understand, explain, uncover, and reduce biased, deceptive, and fraudulent methods in scientific research. The Seven Deadly Sins of Psychology is a must-read for the consumers of scientific research, and also for the current and future debunkers of pseudoscience.

Show 16 footnotes

  1. Joseph, J., (2017), Schizophrenia and Genetics: The End of an Illusion, e-book.
  2. Chambers, C., (2017), The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice, Princeton, NJ: Princeton University Press.
  3. Walster, G. W., & Cleary, T. A., (1970), A Proposal for a New Editorial Policy in the Social SciencesThe American Statistician, 24, 16-19, p. 18.
  4. John, L. K., Loewenstein, G., & Prelec, D., (2012), Measuring the Prevalence of Questionable Research Practices with Incentives for Truth TellingPsychological Science, 23, 524-532.
  5. Chambers, 2017, p. 25.
  6. Head et al., (2015), The Extent and Consequences of P-Hacking in SciencePLoS Biology, 13(3): e1002106. doi:10.1371/journal.pbio.1002106
  7. Head et al., 2015.
  8. Chambers, 2017, p. 25.
  9. Kerr, N. L., (1998), HARKing: Hypothesizing After the Results Are KnownPersonality and Social Psychology Review, 2, 196-217.
  10. Chambers, 2017, p. 25.
  11. Chambers, 2017, pp. 18-19.
  12. Jekel et al., (2007), Epidemiology, Biostatistics, and Preventive Medicine (3rd ed.), Philadelphia: Saunders-Elsevier, p. 206.
  13. Joseph, J., & Baldwin, S., (2000), Four Editorial Proposals to Improve Social Sciences Research and Publication, International Journal of Risk and Safety in Medicine, 13, 109-116.
  14. Chambers, 2017, p. 21.
  15. Chambers, 2017, p. 174.
  16. Psychiatric adoption studies are subject to several major environmental confounds and biases. These include (1) the selective placement of adoptees on the basis of a child’s socioeconomic status and perceived genetic background; (2) the shared birthmother–child prenatal environment; (3) late separation from the birthparent(s); (4) late placement with the adoptive family; and (5) that birthparents who give up a child for adoption, and the adoptive parents who reared them, are not representative of birthparents and rearing parents in the general popuplation. It is therefore not true, as adoption study supporters usually claim, that these studies are able to make a clean separation between (“disentangle”) the potential influences of genes and environments.

Support MIA

MIA relies on the support of its readers to exist. Please consider a donation to help us provide news, essays, podcasts and continuing education courses that explore alternatives to the current paradigm of psychiatric care. Your tax-deductible donation will help build a community devoted to creating such change.

$
Select Payment Method
Loading...
Personal Info

Credit Card Info
This is a secure SSL encrypted payment.

Donation Total: $20.00

12 COMMENTS

  1. The “gold standard treatments” for “schizophrenia,” the antipsychotics/neuroleptics, can create both the negative and positive symptoms of “schizophrenia.” The negative symptoms can be created via neuroleptic induced deficit syndrome and the positive symptoms can be created via antipsychotic induced anticholinergic toxidrome. This means the etiology of most “schizophrenia” is likely iatrogenesis, rather than of “genetic” origin.

    But the “mental health professionals” are largely ignorant of this fact, since neither of these neuroleptic induced illnesses is listed in their scientifically invalid DSM billing code “bible.” Plus the majority of people labeled as “schizophrenic” are actually child abuse victims. Unbeknownst to the majority of “mental health professionals” and mainstream doctors, drugs don’t cure people of legitimate distress caused by rape of children. The only doctor I could find who was cognizant of this fact was an oral surgeon.

    I do hope the medical community gets out of the business of profiteering off of covering up rape of children, by turning millions of child abuse victims into the “mentally ill” with the psychiatric drugs, on a massive scale. But I do know today’s “mental health industry” is a multi billion dollar pedophile empowerment industry, despite this being illegal. But kudos to you, you now have a world ruled by “luciferian pedophiles.” Was that the goal of all the pedophilia covering up “mental health professionals”?

  2. This same manipulation of statistical outcomes is found in Neonatology. They include infanticide, delayed treatment, selective treatment, and discontinued treatment as poor survival rates but all these reasons are not attributed as the cause of death. Preemies do survive with appropriate care, and people get back on their feet without labels and drugs.

  3. Thank you for drawing attention to these three methods of “cooking the books” in scientific research.

    Transparency is the only way to prevent these bad practices. Pre-registration is a great idea, but I would
    also go further by requiring researchers to publish their entire data sets at an easily accessible web location, not only in an unadulterated, pre-analysis form, but in the forms that their analysis and modeling is performed on. This would not only allow the statistical manipulations to be audited, but will allow others to conduct their own analysis of the data and present confirming or contradictory conclusions.

  4. This article is an important resource for activists. In layperson’s language it explains how confirmation bias plays itself out in research social and behavioral sciences.

    Activists who advocate for choice, alternatives, and human rights in the mental health system are rooting for higher scientific standards and a long overdue overhaul of what constitutes an ‘evidence based practice’. Higher scientific standards can only help our movement to revolutionize the mental health system

    The author has laid out a central problem with behavioral science research that even a layperson can grasp, it’s vulnerability to corrupting influences, and proposed a valid solution: establishment of psychology research “preregistration,” where investigators would be required to submit an introduction, and their proposed methods, definitions, and analyses, before they collect their data.

    May I suggest that well intended researchers and readers of MIA who do manage to obtain independent sources of funding for alternative approaches to psychosis (out of what little exists) apply the higher standards laid out in this article voluntarily by pre-registering and declaring their methodology before they collect the data? Even if the status quo researchers are not required to do so, doesn’t mean we can’t apply the higher standards ourselves.

  5. …A HUGE, WELCOMING FIRST STEP to alert others that numbers and findings can be manipulated in science. I discovered this while conducting doctoral & post-doc research at the University of Michigan and Johns Hopkins University in the 20th Century. Too often you have to go along to get along in the scientific community. And that’s a shame because in the mental health field, alone, real progress and alternative treatment strategies are neither encouraged nor propose. …Thank you, Jay Joseph for your courage to speak truth to power!