Tag Archives: p-hacking

Replication in Science

The internet is abuzz with the recent publication of an article in Science which replicated 100 studies in psychology, but only found positive results for a third of them. A lot of people are freaking out over this — both scientists and lay people — and I sure wouldn’t want to be one of those scientists whose study couldn’t be successfully replicated. But for those people, it isn’t the end of the world. Most of them didn’t screw up or do anything fraudulent. This is just how science works. A lot of people have talked about what this means about the state of science, or the state of psychology, or the state of whatever else, and I don’t have much to add to this conversation. But this does make me think about a few things about science in general, and it is those things that I want to say a few words about.

1) The publication of a study is not the end of the conversation. I think the media is guilty of misleading people about this. The truth is that you don’t sell many newspapers with headlines like, “scientists discover that X may be true.” What I see instead are headlines like, “scientists prove X is true,” or, “study disproves theory about X.” This isn’t how scientists think about things. If you have never had the pleasure of being in the presence of scientists when they hear the results of a new paper, it is a wonderful thing. More often than not, they will try to find problems with it. Maybe the author conceptualized their problem in the wrong way. Maybe they didn’t think of a confounding variable in their experiment. Maybe their statistics were misapplied. Maybe their mathematical model didn’t describe reality. To the lay person, this might make scientists sound petty for trying to take away a colleague’s accomplishment, but this is just part of the process.

Replication is not what you do to show that scientists are terrible, it is just a part of the process. When a new study comes out, scientists often say, “This is really interesting, but I’d like to see more work done on it.” What they are calling for, is more studies to see whether the effect can be reproduced in a different way. This is replication and is the process.

A few years ago, one of my own studies was subjected to a replication by another research team. They believed that we had failed to take a particular variable into account, and may have made a type I error as a result. I will admit that I figuratively bit my nails when I heard that this study was coming out. Nobody wants their study to be wrong. But it happened to turn out that even with the new variable included in the analysis, our hypothesis was still supported. It could have gone the other way.

This is part of why science is great — it is self-correcting. If one scientist draws the wrong conclusion, either innocently or maliciously, other scientists will be there to fix it. A scientific publication is not the end of the conversation. It is the beginning. When a scientist publishes a paper, they are saying, “This is what we think is going on, and here is some evidence that we are right. What do you think?”

2) When making conclusions in science, there are two types of possible errors: Type I errors and Type II errors. Type I errors are when you conclude that something is true, but it is not. Maybe your study concludes that watering plants with Brawndo (it’s got what plants crave) makes them grow better, when in reality it does not — type I error. A type II error is when you conclude that something is not true, but in reality it is. Maybe you conclude that monkeys are not actually primates, when they really are — type II error. A lot of the conversation around this replication study is about how scientists might be making too many type I errors. This could be true, but it is not my purpose here to comment on that. But I will point out that when you make it harder to make a type I error, you necessarily make it easier to make a type II error. Conversely, when you make it harder to make a type II error, you make it easier to make a type I error. Both types of errors are still errors, and you don’t want to make either one.

Here’s how it works: In science, we often use p-values as a tool for figuring out what happened. When you analyze your data, the p-value measures how likely it is that whatever trend you found in the data is the result of random chance and not a relationship between your variables. a p-value of 1.0 means that there is a 100% chance that your data is random numbers, and a p-value of 0.0 means that it is impossible for random chance to create those numbers (in reality, p-values are never exactly 0.0 or 1.0, but somewhere in between). Scientists use a cutoff of 0.05 for drawing conclusions — that is, you cannot tell people that you made a discovery unless there is a 5% chance or less of your data being meaningless noise. If the chance is 6% or higher, you cannot make any claims. Ideally, most things you find with a p-value of 0.05 or less are true, and most things that you find with a p-value higher that 0.05 are not true. But sometimes you have a real effect with a p-value higher than 0.05, which leads you to think it’s false (type II error), and sometimes you have an effect that is not real with a p-value that is lower than 0.05, and you think it’s real (type I error). If you lower the cutoff value to, say, 0.01, you will get fewer false positives (type I errors), but more false negatives (type II errors). If you raise the cutoff value to 0.1, the opposite happens — type II errors become less common, but type I errors become more common.

3) Science is hard, and we don’t know all of the answers. When we take science classes in school, there is always a right answer. You know that the substance is supposed to turn blue when you add the right chemicals, and if it doesn’t, you know you made a mistake. When I was a teacher, I would frequently have students come to me with a test tube full of orange fluid and ask, “is this right?” I would resist a straight answer as much as I could, because the purpose of the experiment was not to teach them how to make the fluid change colors, but to think like a scientist. The domain of scientists is the edge of human knowledge. There is no one to whom a scientist can ask, “is this right?” Because if there was, they wouldn’t be doing science. No one knows the answers to the questions that scientists ask, which is why scientists are trying to figure out the answers to those questions. This means that we will sometimes get it wrong. It is for this reason that I praise the scientists who do replication studies, as well as the scientists who did the original research. It is all part of the process.

Have a topic you want me to cover? Let me know in the comments or on twitter @cgeppig

Follow me on Facebook