Does Science Have a Reproducibility Problem?

It’s a common refrain on some of the science and technology websites I frequent – the idea that science has a reproducibility problem. The latest one I’ve seen is Aaron Carroll’s piece in The Upshot, a data-driven news site from The New York Times, in which he cites the failure of major pharmaceutical firms like Amgen and Bayer to reproduce the results of prior studies. But is it really a reproducibility problem?

No, that’s just how science works.

Unless a study is conducted exactly as the original (i.e., the same subjects, conditions, and interventions), then it is not truly a reproducibility study. In many cases, conducting a reproducibility study isimpossible, as the original participants are not the same as they were before, either because the study irreversibly changed them or the knowledge of its results influences their behavior. Moreover, it rarely makes sense to exactly replicate a study unless there’s reason to question its validity. Instead, we conduct similar studies to see if the original findings can be applied to a new set of participants or conditions, or to see if there are subgroups within the original pool to which the results do not apply. If the results of the new study are different, it doesn’t mean we failed to reproduce the findings of the original study – it just means the original study hypothesis has been refined.

Let’s use a practical example, in which a hypothetical group of patients at a Veterans Affairs (VA) hospital are randomized to receive a drug to lower blood pressure or matching placebo. Because the study is being conducted at a VA facility, a considerable majority of the patients enrolled in the study are men. Overall, patients who received the active drug experience an average 10 mmHg decrease in blood pressure compared to those who received placebo, leading us to conclude that the drug is effective in lowering blood pressure.

Now, let’s repeat the study in a second group of patients at a suburban hospital, in which only half of the patients are men. In this study, the blood pressure in patients who received the active drug are about the same as those who received placebo. Does this mean we’ve failed to reproduce the results of the original study (leading us to conclude that the drug doesn’t work)? No, it may just mean that the drug works better in men than it does in women, or maybe it doesn’t work in women at all. To conclusively answer this question, we would need to conduct a third study in which men and women are studied separately.

My point is that there are few things in science that are universally true. Even scientific laws are only true under certain circumstances and are occasionally modified or falsified when new data are discovered. There are a countless hypotheses, however – the individual questions that researchers attempt to answer through scientific study. The failure of a new study to reproduce the results of an older one is no more an end to the argument than the original findings were. Instead, it should lead us to question why such a discrepancy exists.


I do agree with the other arguments Carroll makes in his piece, particularly the positive publication bias he alludes to in the title of the piece. The tendency for negative trials to get buried in obscurity (if they are even published at all) is certainly a problem that plagues science.

Image credits: adapted from DNA Lab by University of Michigan SEAS (CC BY 2.0)