I feel like I’ve seen news stories or blog posts about p-values every day this month. First, Andrew Gelman reported that the editor of the journal Psychological Science, famous to some for publishing dubious findings on the strength of p<0.05, will be getting serious about the replicability crisis. (The editorial he referenced came out last November, but Gelman tends to write posts a few months in advance.) Then the American Statistical Association released a statement about p-values, and a few days later, the reproducibility crisis in psychology led to some back-and-forthing between groups of researchers with different perspectives on the issue.
At the heart of much of the controversy is that much-maligned, often misunderstood p-value. The fact that the ASA’s statement exists at all shows how big an issue understanding and using the p-value is. The statement reads, “this was not a lightly taken step. The ASA has not previously taken positions on specific matters of statistical practice.” Retraction Watch has an interview with Ron Wasserstein, one of the people behind the ASA’s statement.
At 538, Christie Aschwanden tries to find an easy definition of p-value. Unfortunately, no such definition seems to exist. “You can get it right, or you can make it intuitive, but it’s all but impossible to do both,” she writes. Deborah Mayo, “frequentist in exile,” has two interesting posts about how exactly p-values should be interpreted and whether the “p-value police” always get it right. Mayo and Gelman were also two of the twenty people who contributed supplementary material for the ASA statement on statistics.
Misuse and misinterpretation of p-values are part and parcel of the ongoing reproducibility crisis in psychology. (Though some say it isn’t a crisis at all.) Once again, Retraction Watch is on it with a response to a rebuttal of a response (once removed?) about replication studies. The post goes into some depth about a study that failed to replicate, and I found it fascinating to see how the replicating authors decided to try to adjust the original study, which was done in Israel, to make it relevant for the Virginians who were their test subjects. Gelman also has three posts about the replication crisis that I found helpful.
One of the underlying issues with replication is something a bit unfamiliar to me as a mathematician: inaccessible data. Not all research is published on the arXiv before showing up in a journal somewhere, so there are still paywalls around some articles. More troubling, though, is the fact that a lot of data never makes it out of the lab where it was gathered. This makes it hard for other researchers to verify computations, and it means a lot of negative results never see the light of day, leading to publication bias. The Neuroskeptic blog reports on a lab that has committed to sharing all its data, good bad and ugly.
So what’s the bottom line? It’s easy to be pessimistic, but in the end, I agree with another post by Aschwanden: science isn’t broken. We can’t expect one experiment or one number to give us a complete picture of scientific truth. She writes,
The uncertainty inherent in science doesn’t mean that we can’t use it to make important policies or decisions. It just means that we should remain cautious and adopt a mindset that’s open to changing courses if new data arises. We should make the best decisions we can with the current evidence and take care not to lose sight of its strength and degree of certainty. It’s no accident that every good paper includes the phrase ‘more study is needed’ — there is always more to learn.
The ASA’s statement identifies the problems quite well, but doesn’t offer solutions. It doesn’t take much reading between the lines to see that, as usual, the internecine warfare between different schools of thought was not resolved.
Statistics and p-values are great and highly useful. It’s our education and learning about them that isn’t so great. Long live the p-value, effect sizes, and their appropriate interpretation.
Yes, but the problem is that there is little agreement about their interpretation. The problem with P values is that they give the right answer to the wrong question. What we really want to know is the probability that we’ll be wrong if we claim to have made a discovery. My take on that is at rsos.royalsocietypublishing.org/content/1/3/140216
And my suggestion about what should be done is at http://rsos.royalsocietypublishing.org/content/1/3/140216#comment-1889100957
Not everyone agrees with my conclusions, but most seem to accept them. Insofar as they are right, I think that statisticians must take some of the blame for not telling users more clear;y.