Multiple Researchers Examining the Same Data Find Very Different Results

A new study demonstrates how the choice of statistical techniques when examining data plays a large role in scientific outcomes.


If quantitative psychological science delivers objective facts, then it might be assumed that several different quantitative researchers examining the same data set would come to the same results.  Unfortunately, it appears that this is not the case. A new study finds that the various choices made by researchers in the statistical analysis can lead to different results, even when analyzing the same data set.

“The process of certifying a particular result on the basis of an idiosyncratic analytic strategy might be fraught with unrecognized uncertainty, and research findings might be less trustworthy than they at first appear to be,” the authors write.

Photo Credit: Pixabay

In the study, twenty-nine teams, made up of a total of 61 international researchers, were given the same data and each group was asked to conduct an analysis of the data. The question was relatively straightforward—are soccer referees more likely to give red cards to players with darker skin than to those with lighter skin?

The answer, on the other hand, was not as straightforward. Twenty teams (69%) found a significant effect (referees were more likely to give red cards to darker-skinned players), while nine teams (31%) found that there was not a significant effect (referees did not appear to discriminate according to skin tone). Even amongst the researchers that found a positive result, the effect ranged from very slight to very large.

So how did these researchers—looking at the exact same data—arrive at such different results? The answer lies in the choice of statistical analysis and the covariates examined by the researchers.

The statistical methods chosen by the research teams varied widely, including linear regression, multilevel regression, and Bayesian statistics. The covariates (other factors that might influence the results) chosen by the research teams also varied widely, with 21 different combinations being chosen. Three teams, for example, used no covariates, while three other teams only controlled for the effects of player position as a covariate.

The researchers suggest that the different decisions made during this process, including what other factors to control for, what type of analysis to run, and how to deal with randomness and outliers in the dataset, all lead to divergent conclusions.

It should be noted that none of the analyses were merely wrong or poorly conducted. Peer statisticians rated the quality of the investigations, and whether the study had flaws or not did not correlate with whether the team found an effect. Also, these were researchers with no stake in this particular question—it was chosen as a large dataset with multiple covariates that many different teams could examine. Indeed, the differences in result could not be explained by the researcher’s pre-existing biases about the answer, or by the experience level of the researchers, after tests were run to rule out these possible explanations.

In the typical research process, only one analysis is conducted—and only one team is involved. That team uses their best judgment to select the most appropriate way of analyzing the data. However, if that had been done here, that single analysis would be quite misleading. For instance, an individual analysis that indicated that there was no effect would be belied by the twenty teams of researchers who found an effect. Likewise, it would also be misleading to claim that there was a definite effect, given that nine different ways of analyzing the data resulted in no significant impact being found.

The researchers suggest that, in light of this finding, their approach of having multiple research teams analyze the data could be a way of determining how confident we can be in the conclusions. For instance, if numerous groups analyze the dataset and almost all of them find a significant result, we can be more confident of the results.

The study was published in Advances in Methods and Practices in Psychological Science, but the researchers had previously released a commentary on the same subject in Nature.



Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E. . . . Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1, 337–356. doi:10.1177/2515245917747646 (Link)


Mad in America hosts blogs by a diverse group of writers. These posts are designed to serve as a public forum for a discussion—broadly speaking—of psychiatry and its treatments. The opinions expressed are the writers’ own.


Mad in America has made some changes to the commenting process. You no longer need to login or create an account on our site to comment. The only information needed is your name, email and comment text. Comments made with an account prior to this change will remain visible on the site.


  1. Fascinating. Just one point. The cynic that I am questions that the researchers always use the most appropriate way to analyze the data. With so many statistical approaches available it seems they would have the opportunity to choose the statistics that give them the result they want. Would they always avoid that temptation?

    Report comment

    • The problem is that people are approaching the data with an agenda. A true scientist will believe what the data says, even if it contradicts expectations or future income potential. Conflicts of interest have fueled a massive breakdown in scientific integrity across all areas of science.

      Report comment

  2. At the same time the company improperly encouraged off-label uses, it willfully downplayed voiced physician concern and study findings indicating side effects of weight gain and diabetes in patients receiving olanzapine (Zyprexa) therapy for approved use. As a whistleblower summed, “If you torture the data long enough, it’ll tell you anything you want to hear.”

    This practice is, to use Eli Lilly’s own internal email communication-come-court-document, “a small price to pay for the molecule.”

    Report comment

  3. OMG, more psychobabble and gobbledygook!
    Once again proving the TRUTH Mark Twain is credited with:
    “There are 3 kinds of lies: lies, damn lies, and statistics.”
    We may as well call the DSM-5 the “DLM-5”, or “Diagnostic LIES Manual”
    That would make more sense, except there are NO statistics in the DSM-5.
    Which of course makes no sense, but that psychiatry for ya’!
    “Psychiatry, the nonsense science”
    Really, I think “nonsense science” has a friendlier ring to it than “pseudoscience”…..

    Report comment