Testing 3: Accuracy and why you shouldn’t reduce complicated situations to a single number

Recall that a test’s sensitivity is the percentage of false negatives you get. If you test 100 people known to have the disease and your test says 99 of them have the disease that is a 99% sensitive test. Or, in other words, you take the number of really positive people your test found (99) and divide it by the number of people you tested (100). Similarly, specificity is the percentage of false positives you get. Here you should think about testing people who don’t have the disease. You test 100 of them and 5 test positive. Or, if you prefer, you can think of it as taking the number of really negative people you found (95 in our case) and divide it by all the people you tested. This is still 100 because you reported 5 people as having the disease when they didn’t. We say the test is 95% specific. Specific tests are the mirror of sensitive tests, so in this case a negative result on a specific test isn’t necessarily informative. (This time think of a broken test which reports everyone as not having the disease, no false positives. But, as before, not much information.)

O.K. in a previous blog I promised you the definition of what it means for a test to be accurate. I keep my promises. Still, before I do that, I want to stress, this can and usually is a dangerously wrong number to be interested in. The “accuracy” number for a test may make for good advertising by the manufacturer, but it is almost always the wrong number to look at. (But of course I am writing this blog to help you make sense of all that statistical excrement that is thrown out at you, which is why I will give you the definition shortly.). Let me reiterate:

Unless they can tell you a test is very close to 100% accurate i.e. the so called gold standard, they are using a single number to describe a situation that needs to be described by two numbers.  Having an easy to administer test that meets the gold standard is awfully rare. Gold standard tests are usually expensive  and not widely available.

What you want is that every test should report the test’s sensitivity and specificity – I can’t stress enough that you want and need those two numbers- not a blended number, which is what accuracy is defined as!  (This is a good example of how too often the news tries to reduce situations that need to be described by multiple pieces of information into one.)

Anyway, as you might expect, a test’s accuracy is a fraction expressed as a percentage.  On the top of the fraction you put the sum of the correctly identified positive patients and the correctly identified negative patients. So let’s suppose we tested 100 people, and we know that 70 are truly positive and 30 are truly negative. We found 65 of the 70 truly positive people to be positive and found 28 of the truly negative people to be negative. The top of our fraction is then:

65+28

The bottom of the fraction is the number of people we tested – so 100 in our case.

The accuracy of this test is then (65+28)/100  = 93/100 = 93%

Let me end by stressing again: sensitive tests are good for telling you you don’t have the disease. A negative result on a sensitive test is a good thing to get.   Specific tests are the opposite: they are good for telling you do have the disease. A positive result on a very specific test is depressing to get but at least you are likely to be treated for a good reason.  These two numbers are what you are really interested in, not a blended number! Always ask for them when you are given a diagnostic test. Don’t be distracted by having them tell you about the “accuracy” of the test.

For those who are interested, here is a list of common tests with their sensitivity and specificity:

  • AIDS testing meets the gold standard: is both highly specific (99.5%) and highly sensitive (99.5%).
  • Mamograms are highly sensitive (97%) but not highly specific (64.5%)
  • A PSA test of 4.0 as the cutoff had a sensitivity of 21%(!) and a specificity of 91%. It is a pretty bad test-which is why it isn’t routinely used anymore..

So what about COVID 19 nasopharyngeal swab tests? Still the most common test, it involves sticking a really long swab up a patient’s nose. Its specificty and sensitivity is actually really hard information to dig out. If done by really experienced people in the lab under ideal conditions, they aren’t bad, but in the real world, not so much. Thankfully we are moving to saliva tests. I have not been able to find a definitive answer to how sensitive and specific saliva tests are, although all the papers I have looked at say that they should be at least as good as the nasopharyngeal swab tests done under perfect lab conditions-without needing those lab conditions!. One paper out of Australia I found seems to indicate that the saliva test they developed was  98% specific and 84.6% sensitive.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.