I feel like I’ve seen news stories or blog posts about p-values every day this month. First, Andrew Gelman reported that the editor of the journal Psychological Science, famous to some for publishing dubious findings on the strength of p<0.05, will be getting serious about the replicability crisis. (The editorial he referenced came out last November, but Gelman tends to write posts a few months in advance.) Then the American Statistical Association released a statement about p-values, and a few days later, the reproducibility crisis in psychology led to some back-and-forthing between groups of researchers with different perspectives on the issue.
At the heart of much of the controversy is that much-maligned, often misunderstood p-value. The fact that the ASA’s statement exists at all shows how big an issue understanding and using the p-value is. The statement reads, “this was not a lightly taken step. The ASA has not previously taken positions on specific matters of statistical practice.” Retraction Watch has an interview with Ron Wasserstein, one of the people behind the ASA’s statement.
At 538, Christie Aschwanden tries to find an easy definition of p-value. Unfortunately, no such definition seems to exist. “You can get it right, or you can make it intuitive, but it’s all but impossible to do both,” she writes. Deborah Mayo, “frequentist in exile,” has two interesting posts about how exactly p-values should be interpreted and whether the “p-value police” always get it right. Mayo and Gelman were also two of the twenty people who contributed supplementary material for the ASA statement on statistics.
Misuse and misinterpretation of p-values are part and parcel of the ongoing reproducibility crisis in psychology. (Though some say it isn’t a crisis at all.) Once again, Retraction Watch is on it with a response to a rebuttal of a response (once removed?) about replication studies. The post goes into some depth about a study that failed to replicate, and I found it fascinating to see how the replicating authors decided to try to adjust the original study, which was done in Israel, to make it relevant for the Virginians who were their test subjects. Gelman also has three posts about the replication crisis that I found helpful.
One of the underlying issues with replication is something a bit unfamiliar to me as a mathematician: inaccessible data. Not all research is published on the arXiv before showing up in a journal somewhere, so there are still paywalls around some articles. More troubling, though, is the fact that a lot of data never makes it out of the lab where it was gathered. This makes it hard for other researchers to verify computations, and it means a lot of negative results never see the light of day, leading to publication bias. The Neuroskeptic blog reports on a lab that has committed to sharing all its data, good bad and ugly.
So what’s the bottom line? It’s easy to be pessimistic, but in the end, I agree with another post by Aschwanden: science isn’t broken. We can’t expect one experiment or one number to give us a complete picture of scientific truth. She writes,
The uncertainty inherent in science doesn’t mean that we can’t use it to make important policies or decisions. It just means that we should remain cautious and adopt a mindset that’s open to changing courses if new data arises. We should make the best decisions we can with the current evidence and take care not to lose sight of its strength and degree of certainty. It’s no accident that every good paper includes the phrase ‘more study is needed’ — there is always more to learn.