If quantitative psychological science delivers objective facts, then it might be assumed that several different quantitative researchers examining the same data set would come to the same results. Unfortunately, it appears that this is not the case. A new study finds that the various choices made by researchers in the statistical analysis can lead to different results, even when analyzing the same data set.
“The process of certifying a particular result on the basis of an idiosyncratic analytic strategy might be fraught with unrecognized uncertainty, and research findings might be less trustworthy than they at first appear to be,” the authors write.
In the study, twenty-nine teams, made up of a total of 61 international researchers, were given the same data and each group was asked to conduct an analysis of the data. The question was relatively straightforward—are soccer referees more likely to give red cards to players with darker skin than to those with lighter skin?
The answer, on the other hand, was not as straightforward. Twenty teams (69%) found a significant effect (referees were more likely to give red cards to darker-skinned players), while nine teams (31%) found that there was not a significant effect (referees did not appear to discriminate according to skin tone). Even amongst the researchers that found a positive result, the effect ranged from very slight to very large.
So how did these researchers—looking at the exact same data—arrive at such different results? The answer lies in the choice of statistical analysis and the covariates examined by the researchers.
The statistical methods chosen by the research teams varied widely, including linear regression, multilevel regression, and Bayesian statistics. The covariates (other factors that might influence the results) chosen by the research teams also varied widely, with 21 different combinations being chosen. Three teams, for example, used no covariates, while three other teams only controlled for the effects of player position as a covariate.
The researchers suggest that the different decisions made during this process, including what other factors to control for, what type of analysis to run, and how to deal with randomness and outliers in the dataset, all lead to divergent conclusions.
It should be noted that none of the analyses were merely wrong or poorly conducted. Peer statisticians rated the quality of the investigations, and whether the study had flaws or not did not correlate with whether the team found an effect. Also, these were researchers with no stake in this particular question—it was chosen as a large dataset with multiple covariates that many different teams could examine. Indeed, the differences in result could not be explained by the researcher’s pre-existing biases about the answer, or by the experience level of the researchers, after tests were run to rule out these possible explanations.
In the typical research process, only one analysis is conducted—and only one team is involved. That team uses their best judgment to select the most appropriate way of analyzing the data. However, if that had been done here, that single analysis would be quite misleading. For instance, an individual analysis that indicated that there was no effect would be belied by the twenty teams of researchers who found an effect. Likewise, it would also be misleading to claim that there was a definite effect, given that nine different ways of analyzing the data resulted in no significant impact being found.
The researchers suggest that, in light of this finding, their approach of having multiple research teams analyze the data could be a way of determining how confident we can be in the conclusions. For instance, if numerous groups analyze the dataset and almost all of them find a significant result, we can be more confident of the results.
The study was published in Advances in Methods and Practices in Psychological Science, but the researchers had previously released a commentary on the same subject in Nature.
****
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E. . . . Nosek, B. A. (2018). Many analysts, one data set: Making transparent how variations in analytic choices affect results. Advances in Methods and Practices in Psychological Science, 1, 337–356. doi:10.1177/2515245917747646 (Link)
Fascinating. Just one point. The cynic that I am questions that the researchers always use the most appropriate way to analyze the data. With so many statistical approaches available it seems they would have the opportunity to choose the statistics that give them the result they want. Would they always avoid that temptation?
Report comment
The problem is that people are approaching the data with an agenda. A true scientist will believe what the data says, even if it contradicts expectations or future income potential. Conflicts of interest have fueled a massive breakdown in scientific integrity across all areas of science.
Report comment
At the same time the company improperly encouraged off-label uses, it willfully downplayed voiced physician concern and study findings indicating side effects of weight gain and diabetes in patients receiving olanzapine (Zyprexa) therapy for approved use. As a whistleblower summed, “If you torture the data long enough, it’ll tell you anything you want to hear.” https://shadowproof.com/2013/12/18/over-easy-a-small-price-to-pay-for-the-molecule/
This practice is, to use Eli Lilly’s own internal email communication-come-court-document, “a small price to pay for the molecule.” http://web.archive.org/web/20120122013020/http://www.furiousseasons.com/zyprexa%20documents/ZY100378062.pdf
Report comment
OMG, more psychobabble and gobbledygook!
Once again proving the TRUTH Mark Twain is credited with:
“There are 3 kinds of lies: lies, damn lies, and statistics.”
We may as well call the DSM-5 the “DLM-5”, or “Diagnostic LIES Manual”
That would make more sense, except there are NO statistics in the DSM-5.
Which of course makes no sense, but that psychiatry for ya’!
“Psychiatry, the nonsense science”
Really, I think “nonsense science” has a friendlier ring to it than “pseudoscience”…..
Report comment
I don’t know if this is applicable, as it is asking the public questions, their opinion/opinion pole . But asking the question to get the answer you want still might work. Comedy https://www.youtube.com/watch?v=wySaC_z12GY
Report comment