The Base Rate Fallacy: X% of new Covid cases are among the vaccinated is a BS statement

Suppose you see a headline that says something like: “50% of our 100,000 new Covid cases were among the vaccinated”? Should you be concerned that the vaccine isn’t working anymore? The answer is: absolutely not – well, not without a lot more information. This statement is an example of using numbers to confuse rather than illuminate. And the best way to understand that this is almost certainly a totally meaningless statistic,  perhaps even rising to the level of complete BS,  is to use a technique I’ve explained before – think about what a statement would mean at extremes. 

So here is an extreme situation to use to think about this statement. Imagine someone is publishing this “statistic” about a place where, say, 99% of a 20,000,000 population were vaccinated and yes they had 100,000 new cases they were reporting on. The “statistic” in our headline is saying that, of the 100,000 new cases, 50,000 of them were in the vaccinated population (50% of 100,000 cases), and so 50,000 were in unvaccinated people.

So, first off, we can calculate the total number of unvaccinated people is 200,000 (1% of 20mil) and 19,800,000 people were vaccinated (our 99% vaccinated rate => .99*20mil vaccinated people). Then, the odds of getting sick if you are vaccinated is:

50,000/19,800,000 or about .0025 =1/4%

I.e. really low. But if you are unvaccinated the odds are:

50,000/200,000 = 25% 

or 100X greater and really really high. 

Thus, for this hypothetical example, you would know that someone is deliberately trying to confuse you or is simply unaware of the effect choosing the wrong size for the bottom of the fraction (the denominator) has on percentages!  

If the denominator you chose in a calculation is the wrong one, you have fallen victim to what is called “the base rate fallacy.” In this case, the “statistic” used the total number of cases of covid (100,000) as the denominator, not the total number of vaccinated people (19,800,000). You simply can not divide by 200,000 to find out the odds of getting the disease if you are vaccinated, because your “population” size of vaccinated people is 19,800,000 not 200,000.  And, when you divide by 200,000 when you are supposed to divide by 19,800,000  – well you saw the result above, you are off by about 100 fold! .

Base rate fallacies come up all the time in thinking about medical statistics. They are, for example, at the root of the “paradox of the false positive.” which I talked about before (https://garycornell.com/2020/05/28/testing-4-i-tested-positive-do-i-really-have-the-disease/). Recall that having a positive test result for a disease isn’t enough data to make a decision – you need to know how rare the disease is in a population.  

To sum up:

Any statement about “odds” or “probability” is meaningful only when you know they have used the right size of the sample to divide by. Denominators matter!

9 thoughts on “The Base Rate Fallacy: X% of new Covid cases are among the vaccinated is a BS statement”

  1. Hello, I saw your piece in Slate and wanted to comment without the hassle of creating a Slate account. I understand the base rate fallacy, but I think it has also been employed in the service of downplaying the risk. Suppose someone says your chances of being infected if you are vaccinated is one in a hundred (1%). That sounds positive. But if your chance of infection is one in ten (10%) if you are not vaccinated, isn’t it more accurate to say that your chance of infection if vaccinated is only improved tenfold, not hundredfold?

    1. Yes that is right but that isn’t what the headlines do. They use the total number of cases as the dominator and that is always wrong. You need to use the size of the populations to determine risk

  2. I agree that denominators matter, but it’s the same fallacy here. The disease isn’t existing in a vacuum. The denominator you really need to use is exposure. You assume that every one of the 200M people was exposed (within the given timeframe of the argument) which isn’t the case. Maybe this example happened in a small town of 100,000 people that all got sick regardless of the vaccine, now probably not of course, hah. I know you understand this, but I think we all need to be more careful with our ‘headlines’ and people should fact check articles like the one mentioned as well as this because the story is a lot different when it says that ‘50% of cases stemming from the XYZ music festival were vaccinated individuals’ where you can assume near-constant exposure of a population (not perfect of course, but closer that assuming the entire population of a country).
    Keep up the good work, but let’s be more clear about the assumptions we all make in writing. 🙂

  3. Also here from slate. You criticize using misleading numbers to confuse, but your “extreme” hypo does that yourself. Yes in a population where 99% are vaccinated, 75% of sick people coming from the vaccinated group actually shows the vaccine is working quite well. But in a population where 80% are vaccinated, 75% of sick people coming from the vaccinated group would show a vaccine that was only slightly effective (at a population vaccination rate of 75%, which is much higher than the national average right now, it shows a vaccine that is completely ineffective). I think most people quite reasonably think that the real number is closer to 80% than 99%, and at that point confidently asserting that they should “absolutely not” be worried is quite overstated. It does all depend on the vaccination rate in the population, and we have good reason to think it is significantly higher than the national average, but raising it to an unrealistically high 99% offers false assurance.

    1. I think you are missing the point of using an extreme example, it is to show the base rate fallacy at its starkest i.e that you have to choose the right denominator. Yes, a more realistic scenario might be something like 60% vaccinated and 20% of the cases among the vaccinated and 80% of the cases among the unvaccinated. You could then do the calculation and see the same mistake but the arithmetic would be a bit more painful. The key issue is that any statement that says “X% of the reported cases are among the vaccinated” (or “among the unvaccinated” for that matter) is falling victim to the base rate fallacy, such statements are useless and provide no good information absent knowing how many people are in each group, putting them as the lead in an article is just plain bad, that was the point of the article.

  4. Ugh. Articles like these are reckless and represent everything that’s wrong with American bioinformatics, let alone overall US communications around Covid.

    There’s nothing wrong with stating a fact: that 75% of the infected were vaccinated. But things go south in a hurry when an enforced behavioral intent gets attached to it.

    Yes, Americans can’t do stats (or math). But framing the facts to intentional outcomes, such as higher vaccination rates, are deceptive and (often wrongly) prescriptive. So we get framing to encourage vaccinations obscuring the truth of how vaccinations do not prevent infection or spread. BOTH are important details.

    Hence we get the CDC lifting mask mandates a month ago among “Freedom Day” declarations in part because they worried about reducing vaccination incentives: manipulative intentions over facts. Yet we knew the UK and India were raging with Delta and it was only a matter of time.

    But by discouraging masking, the CDC encouraged the false belief that vaccines were a magic amulet that prevented infection or transmission and the vaccinated could simply pretend like COVID no longer existed … resulting in unnecessary infections and deaths.

    1. The base rate fallacy comes up all the time and it’s never good. Every few years they survey doctors about what would they do if a patient had a positive result on an accurate disease if the disease is rare. The results are always scary.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.