The correlation between the divorce rate in Maine and the per capita consumption of margarine, though compelling, is totally spurious. This is just one of the many such correlations that Tyler Vigen explores on Spurious Correlations, and in his book of the same name.
I’ve been thinking a lot about fallacies in statistics this week, every since I read Stephen Woodcock’s article Paradoxes of Probability and Other statistical Strangeness in TheConversation. This article gives great examples and graphics to explain some of the weirdness of statistics like Simpson’s paradox and my personal favorite, the base rate fallacy.
The extent to which two variables, X and Y are related is most often measured using the Pearson correlation coefficient. Formally, this is just the covariance of X and Y divided by the product of their standard deviations. Practically, it is some number between -1 and 1, where a correlation coefficient of 1 means total correlation, 0 means no correlation, and -1 means a negative correlation.
For example, the number of people who tripped over their own two feet and died has a correlation coefficient of 0.9 with the number of lawyers in North Carolina. Which means that they are very closely correlated, which in reality means absolutely nothing. Compare this to 0.8, the correlation coefficient for the number of people who tripped over their own two feet and died compared with Apple iPhone sales.
Recently fivethirtyeight also explored the prevalence of spurious correlations in nutritional studies in You Can’t Trust What You Read About Nutrition. Nutritional data, which is largely gathered through food diaries and eating questionnaires, leads to all sorts of crazy correlations like cabbage and innies and nuts and immortality.
If you’re teaching a course in statistics, Vigen’s website would be a really fun place to pick up data sets and cautionary examples for your students. Vigen includes links to all of the data he uses in his charts.