Reproducibility in Science: Study Finds Psychology Experiments Fail Replication Test

Scientists toiling away in their laboratories, observatories and offices don’t just fabricate data, plagiarize other research, or make up questionable conclusions when publishing their work. Participating in any of these dishonest activities would be like violating a scientific Hippocratic oath. So why do many scientific studies and papers turn out to be unreliable or flawed?

(Credit: Shutterstock/Lightspring)

(Credit: Shutterstock/Lightspring)

In a massive analysis of 100 recently published psychology papers with different research designs and authors, University of Virginia psychologist Brian Nosek and his colleagues find that more than half of them fail replication tests. Only 39% of the psychology experiments could be replicated unambiguously, while those claiming surprising effects or effects that were challenging to replicate were less reproducible. They published their results in the new issue of Science.

Nosek began crowdsourcing the Reproducibility Project in 2012, when he reached out to nearly 300 members of the psychology community. Scientists lead and work on many projects simultaneously for which they receive credit when publishing their own papers, so it takes some sacrifice to take part: the replication paper lists the authors of the Open Science Collaboration alphabetically, rather than in order of their contributions to it, and working with so many people presents logistical difficulties. Nevertheless, considering the importance of scientific integrity and investigations of the reliability of analyses and results, such an undertaking is worthwhile to the community. (In the past, I have participated in similarly large collaboration projects such as this, which I too believe have benefited the astrophysical community.)

The researchers evaluated five complementary indicators of reproducibility using significance and p-values, effect sizes, subjective assessments of replication teams and meta-analyses of effect sizes. Although a failure to reproduce does not necessarily mean that the original report was incorrect, they state that such “replications suggest that more investigation is needed to establish the validity of the original findings.” This is diplomatic scientist-speak for: “people have reason to doubt the results.” In the end, the scientists in this study find that in the majority of cases, the p-values are higher (making the results less significant or statistically insignificant) and the effect size is smaller or even goes in the opposite direction of the claimed trend!

Effects claimed in the majority of studies cannot be reproduced. Figure shows density plots of original and replication p-values and effect sizes (correlation coefficients).

Effects claimed in the majority of studies cannot be reproduced. Figure shows density plots of original and replication p-values and effect sizes (correlation coefficients).

Note that this meta-analysis has a few limitations and shortcomings. Some studies or analysis methods that are difficult to replicate involve research that may be pushing the limits or testing very new or little studied questions, and if scientists only asked easy questions or questions to which they already knew the answer, then the research would not be particularly useful to the advancement of science. In addition, I could find no comment in the paper about situations in which the scientists face the prospect of replicating their own or competitors’ previous papers; presumably they avoided potential conflicts of interest.

These contentious conclusions could shake up the social sciences and subject more papers and experiments to scrutiny. This isn’t necessarily a bad thing; according to Oxford psychologist Dorothy Bishop in the Guardian, it could be “the starting point for the revitalization and improvement of science.”

In any case, scientists must acknowledge the publication of so many questionable results. Since scientists generally strive for honesty, integrity and transparency, and cases of outright fraud are extremely rare, we must investigate the causes of these problems. As pointed out by Ed Yong in the Atlantic, like many sciences, “psychology suffers from publication bias, where journals tend to only publish positive results (that is, those that confirm the researchers’ hypothesis), and negative results are left to linger in file drawers.” In addition, some social scientists have published what first appear to be startling discoveries but turn out to be cases of “p-hacking…attempts to torture positive results out of ambiguous data.”

Unfortunately, this could also provide more fuel for critics of science, who already seem to have enough ammunition judging by overblown headlines pointing to increasing numbers of scientists retracting papers, often due to misconduct, such as plagiarism and image manipulation. In spite of this trend, as Christie Aschwanden argues in a FiveThirtyEight piece, science isn’t broken! Scientists should be cautious about unreliable statistical tools though, and p-values fall into that category. The psychology paper meta-analysis shows that p<0.05 tests are too easy to pass, but scientists knew that already, as the Basic and Applied Social Psychology journal banned p-values earlier this year.

Furthermore, larger trends may be driving the publication of such problematic science papers. Increasing competition between scientists for high-status jobs, federal grants, and speaking opportunities at high-profile conferences pressure scientists to publish more and to publish provocative results in major journals. To quote the Open Science Collaboration’s paper, “the incentives for individual scientists prioritize novelty over replication.” Furthermore, overextended peer reviewers and editors often lack the time to properly vet and examine submitted manuscripts, making it more likely that problematic papers might slip through and carry much more weight upon publication. At that point, it can take a while to refute an influential published paper or reduce its impact on the field.

Source: American Society for Microbiology, Nature

Source: American Society for Microbiology, Nature

When I worked as an astrophysics researcher, I carefully reviewed numerous papers for many different journals and considered that work an important part of my job. Perhaps utilizing multiple reviewers per manuscript and paying reviewers for their time may improve that situation. In any case, most scientists recognize that though peer review plays an important role in the process, it is no panacea.

I know that I am proud of all of my research papers, but at times I wished to have more time for additional or more comprehensive analysis in order to be more thorough and certain about some results. This can be prohibitively time-consuming for any scientist—theorists, observers and experimentalists alike—but scientists draw a line at different places when deciding whether or when to publish research. I also feel that sometimes I have been too conservative in the presentation of my conclusions, while some scientists make claims that go far beyond the limited implications of uncertain results.

Some scientists jump on opportunities to publish the most provocative results they can find, and science journalists and editors love a great headline, but we should express skepticism when people announce unconvincing or improbable findings, as many of them turn out to be wrong. (Remember when Opera physicists thought that neutrinos could travel faster than light?)

When conducting research and writing and reviewing papers, scientists should aim for as much transparency and openness as possible. The Open Science Framework demonstrates how such research could be done, where the data are accessible to everyone and individual scientist’s contributions can be tracked. With such a “GitHub-like version control system, it’s clear exactly who takes responsibility for what part of a research project, and when—helping resolve problems of ownership and first publication,” writes Katie Palmer in Wired. As Marcia McNutt, editor in chief of Science, says, “authors and journal editors should be wary of publishing marginally significant results, as those are the ones that are less likely to reproduce.”

If some newly published paper is going to attract the attention of the scientific community and news media, then it must be sufficiently interesting, novel or even contentious, so scientists and journalists must work harder to strike that balance. We should also remember that, for better or worse, science rarely yields clear answers; it usually leads to more questions.

Does your social circle bias your view of the world?

You know who your friends are. You have common interests with them as well as some things you disagree about, and they’re the ones who respond to your texts, tweets and Facebook posts. You know how you compare to the Joneses next door, but what about to the rest of the neighborhood? It turns out that, based on extensive research by Dr. Mirta Galesic and other social psychologists, most people tend to be more similar to their social circle than to the general population, and this influences their views of others’ experiences. In other words, our limited social experiences affect how we perceive other people.

According to Dr. Mirta Galesic, one's social circle affects one's views and assessments of the general population.

According to Dr. Mirta Galesic, one’s social circle affects one’s views and assessments of the general population.

Mirta Galesic, now the Cowan Chair in Human Social Dynamics at the Santa Fe Institute in New Mexico, previously worked at the Max Planck Institute for Human Development in Berlin, Germany, and earned her Ph.D. in Croatia. She has lived and worked in a variety of places and accrued experience working with researchers around the world.

Many psychologists pry into the human mind, while many social scientists ask the question, “What is in the environment?” Galesic’s approach seeks to combine these viewpoints by both exploring the mind and environmental influences on social behavior as well as the complex interactions between them. She attempts to navigate the difficult path between nature and nurture.

Focusing only on the mind when studying human cognition only tells part of the story, according to Galesic.

Focusing only on the mind when studying human cognition only tells part of the story, according to Galesic.

Over the course of decades of research on human cognition, social psychologists have identified and coined more and more biases in how we interpret social interactions and the wider world. As Galesic put it, every year a researcher announces, “Oh, I’ve discovered a new bias!” Some of the biases seem contradictory too, such as false consensus and false uniqueness, where one overestimates how one’s views are similar to others’ or how unique they are.

Galesic’s recent research, which she presented to us in a fascinating lecture at the Santa Fe Institute on Monday, includes too opposing biases. She refers to the first one, self-enhancement, as the “Lake Wobegon effect,” which refers to the amusingly optimistic motto in Prairie Home Companion, where “all the women are strong, all the men are good-looking, and all the children are above average.” The glass is at least half full.

Steve Loughnan, a social psychologist at University of Edinburgh, has observed this effect in his independent research as well. In 2011, he found greater self-enhancement “in societies with more income inequality, and income inequality predicted cross-cultural differences in self-enhancement better than did individualism/collectivism.” In contrast, however, sometimes people exhibit the opposite, self-depreciation bias, in which one pessimistically believes that they or their group is below average. This tends to happen when one imagines that one is worse than others with apparently difficult tasks, where success is relatively rare. (How do your skills compare when it comes to understanding calculus or cooking a souffle?) Moreover, some people appear to be “unskilled and unaware of it,” according to University of Michigan professor Katherine Burson.

In two recent studies, Galesic collaborated with Henrik Olsson, a colleague at her former institute in Berlin, and Jörg Riesskamp, a psychologist at the University of Basel, Switzerland. They published their research on Dutch, German, and US populations in Psychological Science and Cognitive Science Society Proceedings. They start with the well-known observation of “homophily,” in which interactions between like-minded individuals creates a tendency for people to associate with others similar to themselves, for example with respect to socioeconomic status and ideology. Galesic and her co-workers perform a rigorous statistical analysis of thousands of randomly selected respondents with a “social sampling model,” in which people infer how others are doing by sampling from their own immediate social environments.

Galesic, Olsson & Riesskamp (2012): self-enhancement and self-depreciation in people's estimates of household wealth, work stress, and number of friends in their social circles and general population.

Galesic, Olsson & Riesskamp (2012): self-enhancement and self-depreciation in people’s estimates of household wealth, work stress, and number of friends in their social circles and general population.

It turns out that, as Galesic concludes, that “people are well attuned to their immediate social environments but not as well to broader society.” For example, people exhibit self-enhancement when it comes to work stress: they view their own position as better than it really is, especially for those who experience relatively high levels of stress. On the other hand, people have an apparent self-depreciation with respect to household wealth, in which one’s position appears worse than it really is, especially those who are better off. Both effects could be explained by Galesic’s model, which appears to demonstrate that a more complete picture of the nature of human cognition requires understanding people’s inference processes and their environments.

Although knowing one’s social circle does not translate into accurate knowledge of characteristics of the general population, with which they have less contact, that is not necessarily a problem especially if one is aware of the effect. In addition, one can attempt to reduce that bias by enlarging and diversifying one’s social circle. Galesic herself described how, since moving to the US last year, she has been trying to immerse herself in a wide range of social and political environments and expose herself to a variety of news sources, even going so far as to include Fox News and Sean Hannity.

These and related sociological and psychological effects continue to generate both scientific and public interest. Eytan Bakshy and collaborators recently found that Facebook and other social media tend to herd people into “filter bubbles,” where people selectively encounter news and views similar to their own, thus increasing political polarization. In addition, Shai Davidai and Thomas Gilovich polled 3,300 Americans and discovered that people overestimated upward economic mobility, especially if they are in poor or conservative groups. They continue to believe in the “American Dream.”

One can imagine important and interesting implications of this research, which Galesic outlined at the end of her presentation. For example, since beliefs travel through social networks, one might encourage support or awareness about particular policies through them. One could communicate important information, such as about medical screenings and vaccines, since “systematic peer-to-peer diffusion might be more effective.” Moreover, the differences between people’s immediate social circles and the larger society highlight the importance of encouraging diversity in neighborhoods and workplaces, communicating with people with different views, and the benefits of immersion in different communities.

COMPETES Act: The House Science Committee’s Controversial Bill

Two weeks ago, the United States House Science Committee, chaired by Rep. Lamar Smith (R-TX), passed the America COMPETES Reauthorization Act (H.R. 1806) along party lines. Originally authored by Bart Gordon (D-TN) in 2007 to improve the US’s competitiveness and innovation in science, technology, engineering and mathematics (STEM) fields, it contributed substantial funding to research and activities in federal agencies including the National Science Foundation (NSF), Department of Energy (DOE), and the National Institute of Standards and Technology (NIST). (In a previous post, I was hopeful about the passage of an earlier version of the bill.) Its current version, however, includes contentious cuts to NSF and DOE research programs, and it now proceeds to the House floor.

Although the President’s Budget Request for fiscal year 2016 includes small increases for the NSF, DOE Office of Science, and NIST, the new COMPETES Act, if passed in its current version, would shift funding away from research in the social sciences, geosciences, renewable energy, energy efficiency, and biological and environmental research. In other words, federally funded research in some science fields would gain more support at the expense of these fields, whose funding would be cut by 10-50%. In particular, the bill would severely narrow the scope of NSF research and scientific facilities in the social, behavioral, and economic (SBE) and geoscience (GEO) directorates and would reduce the DOE’s basic and applied research programs in climate change and the Advanced Research Projects Agency-Energy (ARPA-E).

I suppose it could be worse. Lamar Smith’s earlier version included attacks and interference in the NSF’s scientific peer-review process (which I discussed in
this post in March), and he made a small concession by removing such language from the bill.

Clearly not happy with the COMPETES Act, scientists of all stripes continue to voice their opposition. While the House Science Committee’s Republican majority rejected one Democratic amendment after another, 32 scientific agencies submitted official letters for the record describing their concerns. (These agencies include the American Physical Society and American Institute of Physics, of which I am a member.) Moreover, the American Association for the Advancement of Science (AAAS)—the US’s premier scientific society—submitted a letter as well, pointing out that H.R. 1806 violates its own Guiding Principles. The letter also states, “NSF is unique among federal agencies in that it supports a balanced portfolio of basic research in all disciplines, using the scientific peer review system as the foundation for awarding research grants based on merit.”

In my opinion, the COMPETES Reauthorization Act needs serious revision so that scientists in all fields, including the social sciences and geosciences, may continue their work at an internationally respected level. This would certainly make the US more competitive in science and would aid people seeking STEM careers. If the bill’s proponents will not allow these necessary improvements to be made, then the bill should be rejected.

For more information, check out this well-written article in Wired and detailed coverage in Science magazine and Inside Higher Ed.