If quantitative psychological science delivers objective facts, then it might be assumed that several different quantitative researchers examining the same data set would come to the same results. Unfortunately, it appears that this is not the case. A new study finds that the various choices made by researchers in the statistical analysis can lead to different results, even when analyzing the same data set.
“The process of certifying a particular result on the basis of an idiosyncratic analytic strategy might be fraught with unrecognized uncertainty, and research findings might be less trustworthy than they at first appear to be,” the authors write.
In the study, twenty-nine teams, made up of a total of 61 international researchers, were given the same data and each group was asked to conduct an analysis of the data. The question was relatively straightforward—are soccer referees more likely to give red cards to players with darker skin than to those with lighter skin?
The answer, on the other hand, was not as straightforward. Twenty teams (69%) found a significant effect (referees were more likely to give red cards to darker-skinned players), while nine teams (31%) found that there was not a significant effect (referees did not appear to discriminate according to skin tone). Even amongst the researchers that found a positive result, the effect ranged from very slight to very large.
So how did these researchers—looking at the exact same data—arrive at such different results? The answer lies in the choice of statistical analysis and the covariates examined by the researchers.
The statistical methods chosen by the research teams varied widely, including linear regression, multilevel regression, and Bayesian statistics. The covariates (other factors that might influence the results) chosen by the research teams also varied widely, with 21 different combinations being chosen. Three teams, for example, used no covariates, while three other teams only controlled for the effects of player position as a covariate.
The researchers suggest that the different decisions made during this process, including what other factors to control for, what type of analysis to run, and how to deal with randomness and outliers in the dataset, all lead to divergent conclusions.
It should be noted that none of the analyses were merely wrong or poorly conducted. Peer statisticians rated the quality of the investigations, and whether the study had flaws or not did not correlate with whether the team found an effect. Also, these were researchers with no stake in this particular question—it was chosen as a large dataset with multiple covariates that many different teams could examine. Indeed, the differences in result could not be explained by the researcher’s pre-existing biases about the answer, or by the experience level of the researchers, after tests were run to rule out these possible explanations.
In the typical research process, only one analysis is conducted—and only one team is involved. That team uses their best judgment to select the most appropriate way of analyzing the data. However, if that had been done here, that single analysis would be quite misleading. For instance, an individual analysis that indicated that there was no effect would be belied by the twenty teams of researchers who found an effect. Likewise, it would also be misleading to claim that there was a definite effect, given that nine different ways of analyzing the data resulted in no significant impact being found.
The researchers suggest that, in light of this finding, their approach of having multiple research teams analyze the data could be a way of determining how confident we can be in the conclusions. For instance, if numerous groups analyze the dataset and almost all of them find a significant result, we can be more confident of the results.
The study was published in Advances in Methods and Practices in Psychological Science, but the researchers had previously released a commentary on the same subject in Nature.
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E. . . . Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1, 337–356. doi:10.1177/2515245917747646 (Link)
Mad in America hosts blogs by a diverse group of writers. These posts are designed to serve as a public forum for a discussion—broadly speaking—of psychiatry and its treatments. The opinions expressed are the writers’ own.