In a previous blog I said: “you compare the number of people getting the disease or getting severely ill in the vaccinated and placebo group, looking for a statistically significant difference.” and some people asked me is that just an expert’s way of saying “the results are obviously different.” Actually no, it is much more subtle than this. For example, in the early days of remdesivir testing, they found: “a 14-day mortality rate of 7.1% for the group receiving remdesivir versus 11.9% for the placebo group”. However, they also said “the difference in mortality was not statistically significant” (italics mine). But to a layperson it sure looks significant, so what is going on? Unfortunately, I can’t explain what is going on in this example yet, but if you keep reading this blog, I will slowly get you there.
Explaining what “statistically significant” means will take a lot more than a single blog because there is a lot of statistics needed to explain this simple phrase–and, even more true, this phrase is controversial among statisticians. In this blog I want to start you on the road but I will stay away from anything really technical.
First off, the reason why statistical significance is so confusing and needs real math to explain it, is that people aren’t wired to understand how powerful “randomness” can be in making rare events happen. For example, I once had a t-shirt that said “miracles happen to me” on the front and on the back it said: “once a year on average”. The idea behind this t-shirt is there are more than 1/2 million minutes in a year, so something weirdly good that is very low probability could (actually would) likely happen in a year to me. It’s a dumb statistical joke I suppose, but the idea is real- random good and bad happens more often than you think.
Let’s think about flipping a coin and looking at the results of a single run of coin tosses. We are trying to decide if the coin is “loaded” and so not a fair coin. We flip it 4 times and we get all heads. Assuming a fair coin and using the “multiplication of probabilities” rule, this will happen only
(1/2)*(1/2)(1/2)*(1/2) = 1/16 of the time
or 6.25% of the time. Is this enough for any statistician to say that there is “statistically significant chance the coin is loaded”?
The answer is no. How unlikely does an event have to be before we say it is statistically significant? It used to be the case that statisticians routinely used a “5% threshold” i.e. that the odds of it happening were less than 1/20, to decide if something was weird enough to say “Huh something isn’t right here”. Here the odds were only 1/16 of this happening, so we say that it “didn’t reach the 1/20 threshold (5%)” and they also would say that therefore there isn’t enough evidence to conclude that we have a loaded coin.
So, now suppose you had actually flipped it five times and got all heads. That would happen only 1/32 ( a little bit more than 3%) of the time by chance. So some statisticians would say: “yep you probably have a loaded coin because you passed the 5% threshold, there is less than a 1 in 20 chance it could have occurred randomly”.
Personally I (and many statisticians) think that is too low a bar to clear, we want something to be much less likely to occur by chance than 1 in 20. We want something to be much more rare before we say it is statistically significant and I and many other people would want there to be less than a 1/100 chance (1%) of it happening randomly. So, I would have flipped it 7 times to start with and yes if I got 7 heads in a row, I would say “yea I will bet that it is loaded” – because this will happen only in 1/128 of the time – which is less than 1% of the time!
But while statistical significance can be easily illustrated by tossing a coin, it is used most often when doing “hypothesis testing” so I will take that up shortly. Stay tuned.