A generalised Bayesian model for the probability of getting a false-positive PCR result

It was a disaster before the New Year that I got a positive result from my predeparture SARS-Cov-2 RT-PCR test at a commercial test centre (Site 1) and had to cancel my international flights for the reunion with my wife. I was shocked by the result as I had been self-isolating for more than 10 days before the test with limited outdoor activities (such as shopping for groceries) and I had no COVID-19 symptoms at all. After this moment of confusion and disappointment, I did a lateral-flow-device (LFD) antigen test at home and got a negative result. The second LFD test on the other day also returned a negative result. In the afternoon of the same day, I had my second PCR test at a different site (Site 2), where a professional swabbed my tonsil and nasal cavity so thoroughly that I even smelled a hint of blood. The result came quickly in the next morning and I immediately booked my third PCR test from another site (Site 3) for the same morning. All the three sites are accredited by the UK Health Security Agency (UK HSA). All my results were reported to the NHS for test and tracing.

Table 1. All the tests I have done so far.

Sample date Site Method Result Result date
28/12/2021 1 RT-PCR (RNA/cDNA) Detected 29/12/2021
29/12/2021 4 LFD (antigen) Not detected 29/12/2021
30/12/2021 4 LFD (antigen) Not detected 30/12/2021
30/12/2021 2 RT-PCR (RNA/cDNA) Not detected 31/12/2021
31/12/2021 3 RT-PCR (RNA/cDNA) Not detected 01/01/2022

A respiratory swab sample was taken for each test.


Model

While I was waiting for the result of my third PCR test, I worked out the posterior probability of not carrying the virus given m positive PCR result from n tests. The model assumes that my respiratory viral RNA load (including absence) was constant throughout my test period. Furthermore, there are two assumptions for data collection:

  • On average, each COVID-19 patient or carrier is infectious in 10 days (namely, a 10-day average infectious period)
  • An incubation period of 3–4 days for people to become infectious and have positive PCR results1.

Since I could not find out any data about asymptomatic patients or carriers, I only consider symptomatic patients for the rest of this post.

Let p denote the prevalence of infectious COVID-19 patients based on the incidence in a specific period of time, f denote the false-positive rate of the PCR tests, s denote the sensitivity of PCR tests, n denote the number of tests, and m denote the number of positive results. Furthermore, we use V = 1 and V = 0 to represent presence and absence of the viral RNA, respectively, and use Tn,m to represent the event of getting m positive results from n tests. Then the posterior probability can be calculated by the following equation: $$ P(V=0|T_{n,m})=\frac{P(V=0)P(T_{n,m}|V=0)}{P(T_{n,m})}=\frac{P(V=0)P(T_{n,m}|V=0)}{P(V=0)P(T_{n,m}|V=0)+P(V=1)P(T_{n,m}|V=1)} $$ Since P(V = 1) = p, P(V = 0) = 1 - p, $$ P(T_{n,m}|V=0)=C(n, m)f^m(1-f)^{n-m} $$ and $$ P(T_{n,m}|V=1)=C(n,m)s^m(1-s)^{n-m} $$ where the number of combinations $$ C(n, m)=\frac{n!}{m!(n-m)!} $$ which is cancelled out from the equation of P(V = 0|Tn,m).

Then we have $$ P(V=0|T_{n,m})=f(n, m; p, f, s)=\frac{f^m(1-f)^{n-m}(1-p)}{f^m(1-f)^{n-m}(1-p) + s^m(1-s)^{n-m}p} $$

Application

For my case, m = 1 and n = 3. I estimated the three parameters as follows.

  • p is calculated as the cumulative number of new cases from 15/12/2021 to 24/12/2021 (four days before my test date) in London divided by the London population (the mid-2020 population of 9.002 million2, which was the only available data at the time of writing this post). So, p = 258856 / 9002000 = 0.0288.
  • f is calculated as the mean of the lower and upper bounds of the false-positive rates of PCR tests3. Therefore, f = (0.008 + 0.043) / 2 = 0.0255.
  • s is calculated as the mean of the lower and upper bounds of PCR tests’ sensitivity3. Therefore, s = (0.85 + 0.95) / 2 = 0.9.

The model is implemented as an R function:

pMis <- function(p, f, s, n, m) {
    p0 <- 1 - p  # P(V = 0)
    p1 <- p  # P(V = 1)
    a <- p0 * f^m * (1 - f)^(n - m)  # P(V = 0) * P[T(n, m)|V = 0]
    b <- p1 * s^m * (1 - s)^(n - m)  # P(V = 1) * P[T(n, m)|V = 1]
    return(round(a / (a + b), digits = 4) * 100)
}

Therefore,

  • P(V = 0|T3,1) = 98.91% for three tests, and P(V = 1|T3, 1)=1 - P(V = 0|T3,1) = 1.09%.
  • P(V = 0|T2,1) = 90.32% for two tests, and P(V = 1|T3, 1)=1 - P(V = 0|T3,1) = 9.68%.

In conclusion, even with one negative result and one positive result from two PCR tests, my chance to be true negative was about 90%; with one positive result and two negative results, my chance to be true positive is only about 1%, not mentioning two negative LFD test results and absence of any COVID-19 symptoms.


References

  1. False negative: How long does it take for coronavirus to become detectable by PCR?. Gavi the Vaccine Alliance.
  2. London’s Population, the London Datastore.
  3. False positivity rate of the COVID-19 PCR test. Office for National Statistics.