Here is a test for you. Let’s say 300 mathematicians were polled concerning how many hours of TV they watch per week. What does it mean to say that a 95% confidence interval for the average number of hours of television watched by a mathematician per week is the interval from 1 to 3 hours? Here are some reasonable sounding answers…

- 95% of mathematicians watch from 1 to 3 hours of TV per week
- There is a 95% probability that the average number of hours of TV watched by all mathematicians is between 1 to 3 hours
- If 100 similar polls were conducted, the average number of hours of TV watched by a mathematician will lie within the interval from 1 to 3 approximately 95 times.

Whatever your answer to the question above, think about whether it is equivalent to the following correct answer: the PROCESS used to create the confidence interval has a 95% chance of success—that is, there is a 95% probability that whatever interval is created through this process will contain the true average. While it is conceivable (but unlikely) that I could find enough mathematicians to replicate my experiment 100 times, I’m still not sure what this tells me since I may get (possibly very) different upper and lower bounds for the confidence interval each time I perform the experiment.

I probably sound kind of like a really annoying Sophomore by now, but here is my honest question: what is the most reasonable way to practically use confidence intervals? Along these lines, it seems that psychologists are strongly considering using alternative methods (to the currently accepted significance level) for reporting the results of their experiments. Under consideration is the reporting of confidence intervals, which do not rely on null hypothesis testing.

I guess one question is – is this mainly a problem with education in that people don’t know what a confidence interval is, or is it that the measurement itself is not serving the purpose that most people have come to use it for

So hopefully you have some ideas for me, and maybe now someone will be inspired to conduct a survey on TV-watching habits of mathematicians at the next JMM’s.

These reflections are all inspired by:

1) Alex Etz, a UT graduate student at The Etz-Files: Blogging About Science, Statistics, and Brains — Nov. 16th and Nov. 20th posts entitled *Can Confidence Intervals Save Pyschology? *http://nicebrain.wordpress.com/2014/11/16/can-confidence-intervals-save-psychology-part-1/

2) From my friend Suz Ward at AIR — July post entitled *Confident or Credible? Two Metrics of Uncertainty for Decision Making* http://www.air-worldwide.com/Blog/Confident-or-Credible–Two-Metrics-of-Uncertainty-for-Decision-Making/

3) Christian Jarrett at the BPS Research Digest– Nov. 14th post entitled *Reformers say psychologists should change how they report their results, but does anyone understand the alternative?* http://digest.bps.org.uk/2012/08/phew-made-it-how-uncanny-proportion-of.html

Hi, nice post. I’m glad I could partially inspire you to write about this. I’d like to take a stab at your question, “Is it a problem with the researchers or the method?” (Paraphrased)

It is almost certainly the method. As is the case with all long-run frequency statistics (the so-called objective probabilities). As I can tell from your post, you understand that frequency stats are simple counting. Imagining our endless repetition of creating intervals, some relative count of them (C) will capture mu. But I want to know what _this_ interval that I laboriously constructed tells me about mu. Unfortunately, the properties of a process cannot be attributed to a realization of that process (our C would be 1 or 0 and we will never know which). So it effectively tells me nothing.

I don’t blame researchers for rejecting that conclusion (it does sound absurd). It is not a researcher’s fault that the method they use cannot answer the kinds of questions they are asking. But once they understand that it can’t they should really stop using that method to try to answer those questions, no? To continue using it wrongly is some form of perverse motivated reasoning that I can’t understand.

To answer your other question, “what is the reasonable way to use CIs?” I can tell you how not to use them. Don’t use them as replacement significance tests. When used for parameter estimation (which they don’t really do, but go with it for now) a 95% CI has a 5% error rate built in. If used as a null hypothesis significance test it inherits the 50%+ error rate typically held by p values.

So the reasonable way to use CIs is to stop using them if you want to make reasonable inferences, and failing that certainly don’t use them as significance tests.