28 Days Earlier

To prevent an epidemic outbreak, all the inhabitants of a city are tested for a virus. The whole population can be divided into the subsets I = { x x is infected } T = { x x is tested positive } \begin{aligned} I &= \{x| x \text{ is infected} \} \\ T &= \{x| x \text{ is tested positive} \} \end{aligned} and the corresponding complements I \overline{I} and T \overline{T} . The test has sensitivity (i.e. test positive if infected) P ( T I ) = 99 % P(T|I) = 99\% and specificity (i.e. test negative if healthy) P ( T I ) = 98 % . P(\overline{T}|\overline{I}) = 98\%. Of the whole population only a percentage of P ( I ) = 0.1 % P(I) = 0.1\% is sick. Paul has a positive test result. To the nearest integer percentage, what is the probability P ( I T ) P(I|T) , that he is actually infected?

Bonus question: Paul is repeatedly tested positive in a second test. How high is the probability that he is infected? (Both tests are statistically independent and have the same effectiveness and specificity.)

5% 10% 33% 45% 71% 98%

1 solution

Markus Michelmann
Oct 17, 2017

In a city with N = 100 , 000 N = 100,000 inhabitants N P ( I ) = 100 N P(I) = 100 persons are infected. Out of them N P ( T I ) P ( I ) = 99 N P(T|I) P(I) = 99 persons are tested positive. Out of the healthy population of N P ( I ) = 99 , 900 N P(\overline{I}) = 99,900 people a percentage of P ( T I ) = 1 P ( T I ) = 2 % P(T|\overline{I}) = 1 - P(\overline{T}|\overline{I}) = 2\,\text{\%} are also tested positive, so that we have N P ( T I ) P ( I ) = 1998 N P(T|\overline{I}) P(\overline{I}) = 1998 false positive test results. The probablitity for a positive tested person to be infected results P ( I T ) = P ( T I ) P ( I ) P ( T I ) P ( I ) + P ( T I ) P ( I ) = 99 99 + 1998 = 4.721 % 5 % P(I|T) = \frac{P(T|I) P(I)}{P(T|I) P(I) + P(T|\overline{I}) P(\overline{I})} = \frac{99}{99 + 1998} = 4.721\,\text{\%} \approx 5 \,\text{\%} Altough the test has a fairly large sensitivity and specitivity, only 5 % 5 \,\text{\%} of the positive tests are actually correct due to the small amount of infections in the whole population.

For the second test we have group of N = 2097 N' = 2097 people with 99 99 infections. Out of them 98 98 persons are tested a second time true positive. Out of 1998 1998 healthy people there are still 40 40 false positive test. Therefore, the infection probability results P ( I T 1 T 2 ) 98 98 + 40 71 % P(I|T_1 \cap T_2) \approx \frac{98}{98 + 40} \approx 71\,\text{\%} After the second positive test result, Paul is probably infected, but we are still far from certainty.

