Testing 5: Predictive Rates: How do you calculate if you really have that disease? – Looking at stuff through the eyes of a mathematician

I promised in my last blog to show you how to do the calculations of whether or not you really have that disease when you tested positive. The calculations aren’t hard but they are always a bit tricky. So fair warning, if you really want to understand how to do the calculations, you should really read this blog with a pencil and paper beside you!

Anyway, to help you understand the ideas behind the calculations and also to show you how different the results can be, I’m going to show you the calculation in two different scenarios.

DO YOU HAVE THE DISEASE? SCENARIO 1

For this scenario, I want you to imagine a routine physical where they do lots of screening tests. One of the screening tests is a highly accurate (99% specific and 99% sensitive) test but the disease it tests for is rare. Let’s suppose only 1/1000 (.1%) of people have this disease and so 99.9% of the people don’t have it (.999). Also, and this will turn out to be a key point, you have no other information such as some symptoms or datainformation about your genetic profile to add to the picture.

Next, to make the arithmetic easier, and because this disease is rare, let’s suppose you test a hundred thousand people. Since the prevalence of the disease is 1/1000 (.001 of the population) and you’re looking at a hundred thousand people, you actually know you have only a hundred people (.001*100,000) with the disease and 99,900 people don’t have the disease (.999*100,000).

Now suppose you are one of those people who have a positive test. We need to calculate what is the probability that you have the disease!

Now I said the test is 99% sensitive and 99% specific, so because it’s 99% sensitive (very few false negatives), it’s going to find 99 out of those 100 people who have the disease. It’s only going to miss just one person who has the disease! We say you found 99 true positives.

What about false positives? Well, our test is really specific but there are an awfully lot of people who don’t have the disease in our 100,000 people sample. There are after all, 99,900 people who don’t have the disease, because only .001% of people have the disease. Since the test is 99% specific, this means you are going to find:

1% of 99,900

false positives i.e. 999 false positives (.01*99,900).

So now to the punchline: what is the probability that you have the disease? It’s really really low! This is so hard to believe and to understand that people actually call it the paradox of the false positive. But the math doesn’t lie, the probability really is that you don’t have the disease even though this test is incredibly accurate!

To calculate the probability, I need you to remember that, on the top of the fraction you use when calculating a probability, you put the number of times you got what you were looking for. For example, when you roll a die and ask for the probability of rolling a specific number, you use a “1” on the top of the fraction because there is only one way to roll a specific number.

In our case, you got what you were looking for (“a true positive”) 99 times. So that is the top of the fraction that we use to calculate the probability. On the bottom of the fraction you have to put all the positives you found, both true positives and false positives, because you are trying to figure out what percentage those 99 true positives were of all the “positive” results.

This means the probability of you having the disease with a positive test is:

99/(99+999) or about 9%

And so the probability of you not having the disease is about 91%!

The point is, because of the rarity of the disease, this really accurate, essentially gold standard, test still isn’t much help: you probably don’t have the disease.

So the moral is: if you are just doing routine testing and there is no additional information, then even with a gold standard test, a positive result for a rare disease really isn’t going to tell you very much. So please don’t immediately jump into having treatments with bad side effects- you really need to find out more.

Okay, so now we can define the positive predictive value and the negative predictive value of our screening test. It turns out, by definition, the positive predictive value is what we just calculated. It is the probability that if you tested positive, you truly have the disease.

Positive Predictive Value = True positives / (True positives + False positives)

So again in our case it’s

99/(99+999) = 99/1098

or about 9%.

Some people like to collect the information we used to do the calculation in a table. A table like this helps you avoid mistakes as the bottom row summarizes everything and the first two rows contain the results of all your arithmetic calculations using the sensitivity and specificity of the test.

	Have the disease	Don’t have the disease	Total
Test Positive	99	999	1,098
Test Negative	1	98,901	98,902
Totals	100	99,900	100,000

You can do the calculations by looking at the correct boxes and then, as you have seen, doing some elementary arithmetic. For example, the positive predictive value is always the value in the box in the first row, first column divided by the value in the first row, third column.

Next, what is the negative predictive value? This is how likely you don’t have the disease when you tested negative. It comes from the information we put in the second row. It’s defined as:

Negative Predictive Value = True negatives / (True negatives + False negatives)

And since we get that from the boxes in the second and third column of the second line of the table, you can see it’s very close to 100%. More precisely it’s:

98901/98902

which works out to 99.998988898%! Screening tests like this are really good at ruling out that you have a disease.

DO I HAVE THE DISEASE? SCENARIO 2

In this case, you have some symptoms and doctors know that people with your symptoms actually have the disease in roughly 25% of the cases they look at. We are, in a sense, using the test to confirm your symptoms. Now we’re going to suppose our test isn’t anywhere near as accurate as the screening test, it’s only 95% sensitive and 95% specific.

So let’s suppose we test ten thousand people with your symptoms. Since 25% of the people who have your symptoms have the disease, we know that 2500 people of our 10,000 person sample have the disease (.25*10,000) and 7500 (.75*10,000) people don’t. Since our test is 95% sensitive, of the 2,500 people who have the disease, we found 2375 of them (.95*2500). Because it is 95% specific, it is only going to have 5% false positives, but this means we are going to have to have (.05*7500) = 375 false positives. Here’s our table for this situation:

	Have the disease	Don’t have the disease	Total
Test Positive	2,375	375	2,750
Test Negative	125	7,125	7,250
Totals	2,475	7,500	10,000

So what is the probability that you have the disease when you tested positive? Just as before, on the top of the fraction, we put the number of true positives we found (that’s 2375) – again it’s the number we put in the first row, first column. The bottom of the fraction is the total number of positives we found – both true positives and false positives, and it’s the number in the first row, third column. So our fraction is:

2375/2750 = .863 or 86.3%

Thus, the positive predictive value in this case where you have some symptoms is 2375/(2375+375) or about 86%- considered a very high predictor.

There is one more concept you will see used, it is called the “false discovery rate”. It is defined as what you get by subtracting the positive predictive value from 1. For scenario 2, our confirmatory test, the false discovery rate is about 1-.86 or 14% – which is again considered pretty good. For scenario 1, our screening test, the false discovery rate is (roughly) a horrific 91%!

*********

Finally, let me leave you with an exercise so you can do one of these calculations on your own. (I’ll give you the answer in my next blog I promise). Here is the background: in one of the most famous and (depressing) studies of how bad physicians can be at doing the calculations you just learned how to do, 20 house officers, 20 fourth-year medical students and 20 attending physicians selected at four Harvard Medical School teaching hospitals were asked the following question:

The prevalence of the disease is 1/1000. Your test has a false positive rate of 5 per cent. It is 100% sensitive – no false negatives. What is the chance that a person found to have a positive result actually has the disease, assuming that you know nothing about the person’s symptoms or signs?

(Guess what? They failed miserably at getting the right answer. More next time.)

2 thoughts on “Testing 5: Predictive Rates: How do you calculate if you really have that disease?”

Harriet says:

September 1, 2020 at 9:44 pm

Hi Gary, I just found your blog! Did you post the answer compared to the Harvard medical students? I couldn’t find it.
Here’s my stab at it:

Out of 1000 people, there is one true positive and 50 false positives, so the chance you have the disease is 1/51 = 1.96%

1. admn says:
  
  September 2, 2020 at 7:18 pm
  
  Oh sheesh, I guess I got caught up in more current events, you are essentially right, that is the correct order of magnitude. I do the actual calculation and post it…

2 thoughts on “Testing 5: Predictive Rates: How do you calculate if you really have that disease?”

Leave a ReplyCancel reply