Reproducibility in Science: Study Finds Psychology Experiments Fail Replication Test

Scientists toiling away in their laboratories, observatories and offices don’t just fabricate data, plagiarize other research, or make up questionable conclusions when publishing their work. Participating in any of these dishonest activities would be like violating a scientific Hippocratic oath. So why do many scientific studies and papers turn out to be unreliable or flawed?

(Credit: Shutterstock/Lightspring)

(Credit: Shutterstock/Lightspring)

In a massive analysis of 100 recently published psychology papers with different research designs and authors, University of Virginia psychologist Brian Nosek and his colleagues find that more than half of them fail replication tests. Only 39% of the psychology experiments could be replicated unambiguously, while those claiming surprising effects or effects that were challenging to replicate were less reproducible. They published their results in the new issue of Science.

Nosek began crowdsourcing the Reproducibility Project in 2012, when he reached out to nearly 300 members of the psychology community. Scientists lead and work on many projects simultaneously for which they receive credit when publishing their own papers, so it takes some sacrifice to take part: the replication paper lists the authors of the Open Science Collaboration alphabetically, rather than in order of their contributions to it, and working with so many people presents logistical difficulties. Nevertheless, considering the importance of scientific integrity and investigations of the reliability of analyses and results, such an undertaking is worthwhile to the community. (In the past, I have participated in similarly large collaboration projects such as this, which I too believe have benefited the astrophysical community.)

The researchers evaluated five complementary indicators of reproducibility using significance and p-values, effect sizes, subjective assessments of replication teams and meta-analyses of effect sizes. Although a failure to reproduce does not necessarily mean that the original report was incorrect, they state that such “replications suggest that more investigation is needed to establish the validity of the original findings.” This is diplomatic scientist-speak for: “people have reason to doubt the results.” In the end, the scientists in this study find that in the majority of cases, the p-values are higher (making the results less significant or statistically insignificant) and the effect size is smaller or even goes in the opposite direction of the claimed trend!

Effects claimed in the majority of studies cannot be reproduced. Figure shows density plots of original and replication p-values and effect sizes (correlation coefficients).

Effects claimed in the majority of studies cannot be reproduced. Figure shows density plots of original and replication p-values and effect sizes (correlation coefficients).

Note that this meta-analysis has a few limitations and shortcomings. Some studies or analysis methods that are difficult to replicate involve research that may be pushing the limits or testing very new or little studied questions, and if scientists only asked easy questions or questions to which they already knew the answer, then the research would not be particularly useful to the advancement of science. In addition, I could find no comment in the paper about situations in which the scientists face the prospect of replicating their own or competitors’ previous papers; presumably they avoided potential conflicts of interest.

These contentious conclusions could shake up the social sciences and subject more papers and experiments to scrutiny. This isn’t necessarily a bad thing; according to Oxford psychologist Dorothy Bishop in the Guardian, it could be “the starting point for the revitalization and improvement of science.”

In any case, scientists must acknowledge the publication of so many questionable results. Since scientists generally strive for honesty, integrity and transparency, and cases of outright fraud are extremely rare, we must investigate the causes of these problems. As pointed out by Ed Yong in the Atlantic, like many sciences, “psychology suffers from publication bias, where journals tend to only publish positive results (that is, those that confirm the researchers’ hypothesis), and negative results are left to linger in file drawers.” In addition, some social scientists have published what first appear to be startling discoveries but turn out to be cases of “p-hacking…attempts to torture positive results out of ambiguous data.”

Unfortunately, this could also provide more fuel for critics of science, who already seem to have enough ammunition judging by overblown headlines pointing to increasing numbers of scientists retracting papers, often due to misconduct, such as plagiarism and image manipulation. In spite of this trend, as Christie Aschwanden argues in a FiveThirtyEight piece, science isn’t broken! Scientists should be cautious about unreliable statistical tools though, and p-values fall into that category. The psychology paper meta-analysis shows that p<0.05 tests are too easy to pass, but scientists knew that already, as the Basic and Applied Social Psychology journal banned p-values earlier this year.

Furthermore, larger trends may be driving the publication of such problematic science papers. Increasing competition between scientists for high-status jobs, federal grants, and speaking opportunities at high-profile conferences pressure scientists to publish more and to publish provocative results in major journals. To quote the Open Science Collaboration’s paper, “the incentives for individual scientists prioritize novelty over replication.” Furthermore, overextended peer reviewers and editors often lack the time to properly vet and examine submitted manuscripts, making it more likely that problematic papers might slip through and carry much more weight upon publication. At that point, it can take a while to refute an influential published paper or reduce its impact on the field.

Source: American Society for Microbiology, Nature

Source: American Society for Microbiology, Nature

When I worked as an astrophysics researcher, I carefully reviewed numerous papers for many different journals and considered that work an important part of my job. Perhaps utilizing multiple reviewers per manuscript and paying reviewers for their time may improve that situation. In any case, most scientists recognize that though peer review plays an important role in the process, it is no panacea.

I know that I am proud of all of my research papers, but at times I wished to have more time for additional or more comprehensive analysis in order to be more thorough and certain about some results. This can be prohibitively time-consuming for any scientist—theorists, observers and experimentalists alike—but scientists draw a line at different places when deciding whether or when to publish research. I also feel that sometimes I have been too conservative in the presentation of my conclusions, while some scientists make claims that go far beyond the limited implications of uncertain results.

Some scientists jump on opportunities to publish the most provocative results they can find, and science journalists and editors love a great headline, but we should express skepticism when people announce unconvincing or improbable findings, as many of them turn out to be wrong. (Remember when Opera physicists thought that neutrinos could travel faster than light?)

When conducting research and writing and reviewing papers, scientists should aim for as much transparency and openness as possible. The Open Science Framework demonstrates how such research could be done, where the data are accessible to everyone and individual scientist’s contributions can be tracked. With such a “GitHub-like version control system, it’s clear exactly who takes responsibility for what part of a research project, and when—helping resolve problems of ownership and first publication,” writes Katie Palmer in Wired. As Marcia McNutt, editor in chief of Science, says, “authors and journal editors should be wary of publishing marginally significant results, as those are the ones that are less likely to reproduce.”

If some newly published paper is going to attract the attention of the scientific community and news media, then it must be sufficiently interesting, novel or even contentious, so scientists and journalists must work harder to strike that balance. We should also remember that, for better or worse, science rarely yields clear answers; it usually leads to more questions.

Journalism and Science Groups Criticize EPA’s Policy Muzzling Science Advisers

As reported by the Associated Press and The Hill, a coalition of journalism and science groups are criticizing the US Environmental Protection Agency (EPA) to end a policy of restricting independent science advisers from contacting and communicating with media outlets, Congress, and others, without permission. The organizations include the Union of Concerned Scientists (UCS), Society of Environmental Journalists (SEJ), American Geophysical Union, Society of Professional Journalists, Society for Conservation Biology, Investigative Reporters and Editors, and Reporters Committee for Freedom of the Press. (Full disclosure: I am a UCS member and obtained some of my information from them.)


In a letter sent to the agency last week, they said that the new policy

requir[es] advisory committee members who receive requests from the public and the press ‘to refrain from responding in an individual capacity’ regarding issues before the committee. The policy requires all requests…to be routed through EPA officials. This prevents many of our nations top independent environmental science experts from sharing their expertise, unfiltered, with the public…The new policy undermines EPA’s efforts to increase transparency. It also contradicts the EPA’s new scientific integrity policy…[It] only reinforces any perception that the agency prioritizes message control over the ability of scientists who advise the agency to share their expertise with the public. On July 8, 38 journalism and good government organizations wrote the president expressing concern about ‘the stifling of free expression’ across many agencies, including the EPA.

The language of the policy is sufficiently vague that it would be easy for a scientist to interpret it such that she or he can’t speak publicly about any scientific issue under consideration. In addition, as pointed out by Andrew Rosenberg, scientists who work for the EPA also face barrier in communicating with the public.

What are the implications of this and why is it important? As the letter points out, this is clearly related to the issue of scientific integrity. We need scientists to serve on advisory committees, work with agencies and policy-makers, and speak transparently about their work and expertise, but such policies will discourage some from participating and will make the EPA less democratic. Government agencies, journalists, and the public deserve access to independent advice and free speech of scientists. (However, we scientists should be careful about speaking about issues beyond our expertise.) That way agencies can make informed decisions when developing or reforming relevant policies and regulations, and journalists and the public can form their own opinions about them as well.

In an update on the situation, the EPA Chief of Staff Gwendolyn Keyes-Fleming responded to say that their Science Advisor, Dr. Bob Kavlock, would review the matter and engage with people in the organizations involved. Let’s hope that the dialogue results in changing the policy.


Finally, in recent related news, political scientist James Doyle says that he was fired from the Department of Energy’s (DOE’s) Los Alamos National Laboratory (LANL) in New Mexico after publishing a scholarly article questioning US nuclear weapons doctrine. They claimed that the article, criticizing the political theories behind the nuclear arms race and a defense of President Obama’s embrace of a nuclear weapons-free future, contained classified information. (We should note though that unfortunately the DOE’s policy on scientific integrity is much shorter and may be more restrictive than the EPA’s.) I’ll keep you updated on this situation, and time permitting, I may write about it further in another post.

Scientific Integrity

In this blog post, let’s discuss scientific integrity–specifically, efforts to keep scientific research as independent as possible from political, corporate, or other influence. Such influences are important for a variety of policies including energy policy (especially related to climate change), health and drugs, food and nutrition, education, etc., when particular companies or organizations have a financial or other stake in the outcome. For example, fossil fuel companies support the “denial industry“, claiming that the science of global warming is inconclusive, agribusinesses promote genetically modified crops, and drug companies promote antidepressant and ADHD drugs, while funding scientific research that often supports their campaigns.

Science informs political officials and agencies when they’re designing regulations for air and water pollution, when determining whether a particular drug is safe and efficacious, when assessing whether particular foods or products are safe for consumers, etc. In my opinion, science can rarely be completely “objective” and “unbiased”; scientists are humans, after all, and they have their own motivations and considerations that can affect their work. The important thing, however, is to reduce political and commercial influence as much as possible so that scientists can do their research and then present their results as clearly and accurately as possible.

In all fields of science, scientists to some extent are affected by funding constraints and grant agencies. These constraints can affect exactly what is studied, how it is researched, and how the results are presented in the media and to the public. Nonetheless, scientific research is particularly important–and susceptible to more outside influences–when it is related to public policy, including the topics above. In addition, politically-related work in the social sciences, especially economics, can be contentious as well.

In the US under the Bush administration, many felt that scientists were under attack. For example, a “revolving door” appeared to be in place when former lobbyists and spokespeople for industries later worked at agencies having the task of regulating their former industries; in particular cases, they appeared to write or advocate for policy shifts that benefited these industries. In 2004, the Union of Concerned Scientists (UCS) released a report, “Scientific Integrity in Policymaking: An Investigation into the Bush Administration’s Misuse of Science”, claiming that the White House censors and suppresses reports by its own scientists, stacks advisory committees, and disbands government panels. There later appeared to be political influence on the Food and Drug Administration (FDA), on researchers working on embryonic stem cells, on sex education (because of arguments about the effectiveness of abstinence-based programs), and on the teaching of biological evolution.

Although the Obama administration appears to have more respect for science and scientists (see this 2013 UCS report), the politicization of some scientific work continues. The assessment of the social and environmental impact of the Keystone XL pipeline may be such an example. The final environmental impact statement, which was released by the State Department yesterday, appears to endorse the pipeline, but the interpretation is unclear (see this coverage in the Wall Street Journal and Scientific American blog).

In any case, these contentious situations will be easier when government agencies have explicit policies for scientific integrity and when the affiliations and employment histories of officials are transparent. It’s also important to keep in mind that the struggle for independent and transparent science never ends. Scientists should always try to be as clear as possible about their views or beliefs when they are relevant to their work (see this NYT blog for useful advice), and results and data should be made publicly available whenever possible.