Scientific journalism is one of the main ways that the public comes to understand the results and societal impacts of scientific studies. This includes online periodicals like Mad in America, printed and TV news, radio, podcasts, and many other forms. Therefore, any bias in the information scientific journalists present to the public has serious consequences for what the general public considers to be true.
In a new study, Julia Bottesini, Christie Aschwanden, Mijke Rhemtulla, and Simine Vazire explored which characteristics of quantitative studies influence journalists’’ likelihood of reporting on them and the belief that the research is trustworthy. The authors highlighted the importance of having “watchdogs” for science – gatekeepers between the available information and the public’s understanding of the truth. They suggest peer reviewers are the “internal watchdogs” while science journalists are equally necessary “external watchdogs.” They write:
“They are the primary external watchdogs that can monitor scientists and scientific institutions for problematic practices and call out dubious claims without much fear of harming their career prospects… However, for science journalists to play this important role, they need to have access to, and know how to use, relevant information when deciding whether to trust a research finding, and whether and how to report on it.”
History of Criticism in Science Journalism
Bottesini and colleagues write that early science writing tended to frame the scientific process, its accuracy, and its impacts with extreme positivity. However, the 1960s and 70s saw gradual increases in more diverse perspectives, criticisms, and acknowledgment of harms caused by the scientific process and ‘progress.’ The authors do not explicitly state from whom these diverse perspectives came, but the 1960s and 70s were when women, racially minoritized people, and low-income white Americans gained access to higher education and scientific fields.
Despite representations of studies becoming more balanced, journalists and scientists still had reasons to mount critiques into the 1990s that journalists were too connected to the scientists they put forward. For example, John Crewdson referred to journalists as “perky cheerleaders” for science. He plainly stated that “by accepting research reports without adequate checking, science writers do a disservice to the public” in a 1993 article. The authors of the present study argue:
“Science journalists now have more opportunity to become good science watchdogs who can help the public consume scientific research through a critical lens and draw the public’s attention to more rigorous research. A more informed public with access to more nuanced scientific information is a social benefit of having more critical science journalists.”
The following are among the benefits of criticism the authors suggest:
- Science could lose credibility without context, caveats, and information about its limitations
- If science journalists falsely imply that all findings are completely and equally valid, the inevitable nuances and differences in results based on context would erode the public’s trust in science.
- Accurate journalism helps ensure that better science gets more attention, leading to more social and financial rewards for scientists conducting similarly rigorous and thoughtful work.
- It helps keep scientists honest. Scientists are human beings with biases who are incentivized by research funding to exaggerate accomplishments. Knowing work will be scrutinized as it is transmitted to the public incentivizes researchers to be more accurate in their claims.
The Current Study
Bottesini, Aschwanden, Rhemtulla, and Vazire’s study explored factors that influence science journalists’ reporting and critique of studies and how journalists determine studies to be trustworthy or newsworthy. Using 1-paragraph descriptions of fictitious behavioral psychology studies, they manipulated four variables within each study vignette for the reasons below (as explicitly shared by the authors):
- The study’s sample size: the larger the sample size, the more precise an estimate is likely to be
- The representativeness of the study’s sample impacts the generalizability of the study to other populations. If the study is missing whole populations that are more represented in the real world, there is no evidence that the findings will apply to populations that were left out. Consequently, good science journalists should favor studies with samples more representative of the corresponding real-world populations.
- The p-value associated with the finding: p values closer to 0 indicate that a study’s results are more likely to be real findings of a phenomenon instead of noise in the data. This suggests good science journalists should favor studies with lower p values.
- The institutional prestige of the researcher who conducted the study: based on the theory that journalists may have biases and find unknown scientists more likely to be credible when associated with elite institutions
Real journalists were presented with eight randomly selected vignettes out of 16 used in the study. Journalists were then asked to rate each study’s trustworthiness (4 questions) and newsworthiness (2 questions). This was followed by three open-ended questions (how do they typically evaluate research findings, how did they evaluate the information presented within this study, and did they have guesses about what characteristics of the fictitious studies the researchers were trying to test). A power analysis showed sufficient power to detect a real effect when presenting eight vignettes to 150-200 participants, and examples of the vignettes can be found here: https://osf.io/xej8k.
Given the author’s arguments for the importance of studies’ samples to their credibility, their reporting of their own sample and its representativeness is surprisingly lacking. Their final sample of 181 science journalists was predominantly women (76.8%; 19.3% men, 2.8% non-binary, and 1.1% preferred not to say). Though the authors did not report this, women were slightly overrepresented compared to the general population of science journalists. Journalists represented a variety of disciplines (life sciences, health & medicine, general science, psychical sciences, psychology, social science, lifestyle & wellbeing, and others) and mediums (predominantly online news and print news).
We can probably safely assume the journalists were predominantly white. Still, the researchers presented no information about the race or ethnicity of their sample, contrary to basic standards set by the American Psychological Association. This is a tremendous oversight, given the conspicuous underrepresentation and continued gatekeeping of people of color in psychology and other social sciences.
White and Asian people are overrepresented in science writing. In contrast, Black, Latine, Southwest Asian, and North African (SWANA) people are underrepresented compared to the general public (based on a combination of 2021 estimates and science writer organization membership data with the US census). Knowing whose ethnoracial cultural views the journalists may have represented is essential to interpreting the following findings.
The Results and Missing Context
The sample size was the only variable that ostensibly impacted journalists’ ratings of trustworthiness and newsworthiness. The researchers found that the sample’s representativeness, the university’s prestige, and the statistical p-value of the finding had little impact on journalists’ trust in the fictitious studies nor the worth of rebroadcasting them. The fake studies all made claims about the study being applicable to the general population, regardless of the representativeness of the sample. This means that journalists should have found studies with less representative samples to be less trustworthy in their claims.
This table was taken directly from their article and showed the percentage of responses to each of the three open-ended questions for each of the four factors the researchers tested:
Table 1. Percent of science journalist participants who identified each of the four manipulated variables in their answers to each of three open-ended questions (answered after participants rated the eight fictitious vignettes).
|Question||Sample Size||Sample type||p-value||Uni prestige|
|What characteristics do you consider when evaluating the trustworthiness of a scientific article?||66.9%||27.1%||30.9%||16.0%|
|What characteristics did you weigh in judging the trustworthiness of the findings presented?||79.0%||34.3%||38.1%||9.4%|
|Before we tell you what [characteristics we varied], do you think you know any of them?||83.2%||38.7%||64.7%||30.3%|
Subjects generally agreed that better sample size, sample representativeness, and p-values could increase the validity and newsworthiness of studies, but not university prestige. However, when asked about their familiarity with each factor, people also responded with less familiarity with institutional affiliation as a metric for trustworthiness or newsworthiness.
The authors also conducted a “subjective, non-systematic exploration of the topics brought up by the participants” for each of the three open-ended questions – when there are multiple well-established and systematic methods for analyzing qualitative data.
Characteristics journalists typically consider about studies, in general, include the prestige of the journal of publication and comments from other researchers. Characteristics they weighed about the vignettes included the study design or methods and the plausibility or relevance of findings and claims. The themes journalists thought the researchers were assessing included the study design and the perceived ethnicity of the fictitious researcher based on the last name.
Bottesini, Aschwanden, Rhemtulla, and Vazire claim to have analyzed whether the implied ethnicity of the researcher in vignettes played a role in ratings. However, all last names they attributed to people of color were of Asian, Latine, and SWANA origin to the complete exclusion of African and African-American last names.
Additionally, the authors assert that last names like Carter (and, presumably, Davis and Lewis) would be “unlikely to be perceived as non-white or as Hispanic.” However, many African-Americans currently possess those last names due to slavers’ naming practices for people they kidnapped from Africa during the 300-year transatlantic slave trade. Consequently, journalists aware of this may have perceived those names as belonging to Black researchers. This highlights another way that ethnoracial demographic information about the journalists themselves was crucial to interpreting these results.
What to Take Away
Overall, one should hold the results of this study lightly, and more research should be done to verify how much these results apply to science journalists in general. Bottesini and colleagues present an excellent case for the importance of science journalists’ standards to what the public sees as the “truth.” However, there are significant shortcomings in the lack of reporting on racial demographics, primarily recruiting journalists within one of the authors’ networks and not acknowledging the breadth of rigorous qualitative methods available in social sciences. For example, the authors sometimes say experimental studies were considered more credible than “observational” or ‘correlational’ studies. In addition, several qualitative observational methods were neither tested nor mentioned by journalists.
The authors conclude that journalists prefer studies with a sample size larger than 500, experimental studies over correlational studies, more prestigious journals, and p values that are statistically significant. Journalists in this sample prioritize p values significant at the .05 level generally without prioritizing studies with p values closer to 0. This suggests that many journalists like the ones in this study may not have the statistical expertise or may not fully employ their knowledge to appraise studies’ validity.
However, Bottesini and colleagues recognize that their results may partially reflect what journalists think they are supposed to use to evaluate research rather than what they actually use. Therefore, they suggest more qualitative and observational studies on how journalists evaluate research to increase the relevance of the concepts studied to what journalists tend to actually do. This includes studies on how journalists are taught about evaluating research.
Bottesini and colleagues lament the presence of “much talk and little action” about improving the representativeness of samples. They suggest the lack of action could be partially due to a lack of consequences.
“Findings based on samples that are very unrepresentative of the population that researchers claim to be studying (e.g., “undergraduates at the university”) were rated as just as trustworthy and newsworthy as findings from studies where the sample and population are more similar (e.g., “people from a nationwide sample”). Given the extra effort often required to recruit more representative samples, if the consequences are trivial, at least as far as media exposure and criticism from journalists go, then this could help perpetuate the status quo.”
These findings highlight the need for research on science journalists’ standards but commit some of the same faux pas the authors call out. Rigorous science requires reporting and understanding how our samples relate to real-world demographics and the demographics of the specific populations researched (in this case, journalists). Being a “good watchdog” and delivering the truth to the public requires not only statistical understanding and contextualizing interpretations but also awareness of how history affects perceptions and reporting all relevant demographic information.
Bottesini, J. G., Aschwanden, C., Rhemtulla, M., & Vazire, S. (2022, July 19). How Do Science Journalists Evaluate Psychology Research? https://doi.org/10.31234/osf.io/26kr3 (Link)