AMS Feature Column banner

An epidemic is a sequence of random events

If a contact is made, then whether or not infection is transferred is much like tossing a (loaded) coin. How can a simulation take all this uncertainty into account?

Bill Casselman
University of British Columbia

Just recently, I started thinking about making my own epidemics. On a computer, of course, with a digital population.

What I had in mind was not very sophisticated. I would assemble a population of `people’, a few of which are infected with the virus, and then progress from one day to the next, letting the virus spread around through interactions in the population. At each moment, a person would be in one of several possible states:

  • S Uninfected but susceptible to infection
  • E Infected, but not yet infectious (exposed)
  • A Infectious, but not yet showing symptoms (asymptomatic)
  • Infectious and showing symptoms
  • Recovered, or otherwise incapable of becoming infected (say, through vaccination)

This is not quite a complete representation of reality. For example, in the current epidemic a very small number of people get reinfected. But it is not too far from being correct.

In general, as the simulation goes on, a person would progress from one state in this list to the next, except of course that being vaccinated is a shortcut from the first to the last state. Infections take place because susceptible persons interact with contagious ones. Even when an interaction takes place, whether or not infection is transmitted is a function of many accidental circumstances (for example, surrounding ventilation) as well as how contagious the infected person is.

There is some further internal detail to some of these states. The degree to which a person is infectious changes in time, usually rising to a peak after a few days, and then decreasing to zero. Hence in a simulation each person has attached to him in addition to (i) a designation of state but also in states A and I (ii) a number measuring infectiousness. A further datum is (iii) the frequency of contacts, especially close contacts, a person has with others. This can change with time. For example, when a person starts showing symptoms, he will presumably reduce the frequency of his contacts.

Where’s the mathematics? An epidemic is driven by random events. The moment at which a person moves from one state to the next is not fixed by circumstances, but is instead a matter of probabilities. The severity of a person’s infection is a matter of chance, as is the length of time from when he is infected to when he becomes infectious. Even if we know the average rate at which an infectious person makes contacts, the exact number of contacts made in one day is also a matter of chance. If a contact is made, then whether or not infection is transferred is much like tossing a (loaded) coin. How can a simulation take all this uncertainty into account?

Generating contacts

Take the matter of contacts. The most important parameter governing contacts is the average number $c$ of contacts made by a person in one day, but that does not mean that the number of contacts in one day is constant. It might well vary from day to day. Instead, it is reasonable to assume that personal interaction is a Poisson process, which means that the probability of making $k$ contacts during one day is $p_{k} = c^{k} e^{-c} / k!$. Note that the infinite sum of the $p_{k}$ is $1$, because of the well known formula

$$ e^{c} = 1 + c + {c^{2} \over 2!} + { c^{3} \over 3! } + \cdots \, . $$

For example, here are the graphs of some examples with a couple of values of $c$:

In a simulation, one will be dealing with a large number of people. Each of them will have his own regimen of interactions. Some of them will be more interactive than others. Thus, we are likely to find ourselves simulating a large number of independent Poisson processes, each one a sequence of random events. How to do this? In a program, this will involve a call to a routine, call it p_random(c) that returns on each call a random non-negative integer whose distribution matches the Poisson process with mean $c$.

Almost every programming language has built into it a routine random() that does something like this. On each call it returns a real number uniformly distributed in the open interval $[0,1)$. (David Austin’s previous FC gives some idea of how this works.) What we would like to do is use that routine to generate non-negative integers following a specified Poisson distribution. To give you some idea of how things go, we can see how this technique can generate integers uniformly distributed in any integral range $[0,n-1]$: get a random number $x$ in $[0,1)$ and then replace it by $\lfloor nx \rfloor$, the integral part of $nx$. If $n=2$ this offers a simulation of coin tossing, and if $n=6$ a simulation of throwing a die.

There is a reasonably well known procedure that does what we want, and very generally. This is explained in Knuth’s classic text. Suppose we are given an arbitrary probability distribution of integers with given probabilities $p_{k}$ for $k \ge 0$. That is to say, we are looking at some repeated event somewhat like coin tossing, in which a non-negative integer $k$ occurs with probability $p_{k}$. How can a program generate integers distributed according to these statistics?

Let $P(k)$ be the cumulative distribution

$$ P(k) = {\sum}_{i = 0}^{k} p(i) $$

Thus $P(k)$ is the probability that the integer $j$ occurring is $\le k$. The original distribution has the property that each $p_{i} \ge 0$ and ${\sum} p(i) = 1$, so $P(k)$ increases from $0$ to $1$. For example, if $c = 2.5$ and $p(k) = e^{-c} c^{k} / k!$ then the graph of $P$ looks like the figure below. Given a random number $t$ in $[0,1)$ we can determine an integer according to the recipe indicated—draw a line to the right from the point $(0,t)$ and select the $x$-coordinate of the point at which it hits this graph.

There is another suggestive way to see this. Make up a rectangle of total height $1$, partitioned into boxes, with the one labeled $k$ of height $p_{k}$. Given the number $x$, mark a point at height $x$ in the rectangle. Select the label of the box that contains it.

In the long run the number of times you hit the box labeled $k$ will be proportional to its area, hence to $p_{k}$. But how do you tell what that label is? There is one straightforward answer to this question:

def p_random():
	x = random()
	# this is the built-in random number generator
	s = 0
	i = 0
	while s <= x:
		i += 1
		s += p[i]
	# at exit p[0] + ... + p[i-1] <= x < p[i]
	return i-1

But this is somewhat inefficient, since each call will on average involve $n/2$ steps. Does there exist an algorithm that requires a number of steps independent of $n$? The answer is yes. A clever method whose basic idea is apparently due to Alastair Walker does this, at the small cost of building some preliminary structures.

Walker’s trick

As far as I know, Walker never explained how he discovered his method, but an elegant interpretation has been offered by Keith Schwartz. The basic idea is what we have already seen:

  1. Start with a box of some kind. Partition it into smaller labeled boxes in such a way that the area of box $k$ is proportional to $p_{k}$.
  2. To generate integers with a given probability distribution, choose points at random inside the box, and return the label of the region hit.
  3. Arrange a way to assign to every random $x$ in $[0,1)$ a point of the box.

The problem is to figure out how to make the partition in such a way that figuring out the label from the geometry of the partition can be done efficiently.

I’ll explain how Walker’s method works for a few simple cases, but first I’ll generalize the problem so that we are not restricted to the Poisson distribution. Suppose very generally that we are given probabilities $p_{i}$ for $i$ in $[0, n-1]$. We now want a method to generate random integers that follow the distribution assigned by $p_{i}$. That is to say, if we generate in this way a large number of integers, we want the proportion of occurrences of $i$ to be roughly $p_{i}$.

The case $n=2$ is like tossing a biased coin, and there is a simple solution. In this case, we are given two probabilities $p_{0}$, $p_{1}$ with $p_{0} + p_{1} = 1$. Partition the unit square in this fashion:

Choose a point $(x, y)$ randomly in the square. In fact, we do not have to pay any attention to $x$. If $y \le p_{0}$ we return $i = 0$ and otherwise we return $i = 1$.

But now, following Keith Schwartz and intending to show how Walker’s algorithm works in this very simple case, I will set up a rectangular region a bit differently. First of all, make its dimensions $2 \times 1$. Partition it twice: once into halves, each half a unit square …

… and then build in each half, say in the $i$-th half, a box of dimensions $1 \times p_{i}$. Label these boxes. Unless $p_{0} = p_{1}$, one of these will overflow at the top:

So then we cut off the overflow and paste it (with label) into the other box:

This shows the case $p_{0} \le p_{1}$. If $p_{1} < p_{0}$ things look like this:

How do we use these diagrams to generate the random integers we want? Choosing a random uniform number $x$ in $[0,1)$ amounts as before to choosing a point in the rectangle. But we do this differently, and we interpret it differently. Given $x$, set $X = 2x$. Let $m$ be the integer part of $X$, which will be either $0$ or $1$: $m = \lfloor X \rfloor$. Let $y = X – m$, the fractional part of $X$. Associate to $x$ a point in the $m$-th box with height $y$. If $y \lt p_{m}$, then we are in the box labeled by $m$, otherwise in the other one. In either case, the process will select that label $m$.

Now look at the case $n = 3$, and suppose that we are given probabilities $p_{0}, p_{1}, p_{2}$ with $\sum p_{i} = 1$. We start off with a rectangle of size $3 \times 1$, partitioned into $1 \times 1$ boxes:

There are again different cases, depending on the relative sizes of the $p_{i}$. The easiest case is that in which two of the values of $p$, say $p_{0}$ and $p_{1}$, are less than $1/3$, which implies that the third is greater than $1/3$. Draw the $i$ boxes of dimension $1 \times p_{i}$ in the $i$-th square, like this:

Now cut off two pieces from the large box and paste them into the smaller one, getting:

I’ll explain in a moment how to use this to generate random numbers.

There is a second case, in which two of the $p_{i}$ are larger than $1/3$:

Here, we want to cut off from the tops and fill in the third. It is tempting to cut off exactly the overflows in the large regions and paste them in, but this would give the third region three labels. which is not what we want. So we fill in from just one of the large regions. This will leave some space in it.

We fill in the new empty space from the other large region. We are now finished:

How to use what we have constructed? In each case, we have partitioned the rectangle of size $3 \times 1$. First, into three unit squares, and then each of these in turn into one or two labeled rectangles. Given a random $x$ in $[0,1)$, we want to come up with some integer in $[0,3)$. How? We first scale it to get $X = 3x$. This will lie in the interval $[m, m+1)$ for $m = \lfloor X \rfloor$. We now turn attention to the $m$-th unit square. The integer we return will be one of the labels found in that square. Let $y = X – m$, the fractional part of $X$, which will be in $[0,1)$. If $y \lt p_{m}$ (the height of the bottom rectangle), p_random returns $m$, otherwise the alternate label in that square.

In effect we are assigning principal and alternate labels to the boxes. Except that there won’t be an alternate label if the box is full.

In the literature, the array I call `alternate’ is called alias, and the method described here is called the alias method.

The full algorithm

This method generalizes nicely. The original version seems to be due to Alastair Walker. It became well known because Knuth called attention to it (although mostly in exercises). Michael Vose then came up with a more efficient version, and made it handle rounding errors more stably.

I quote below, almost verbatim, the algorithm found originally in Vose’s paper. It improves the running time of Walker’s program, and corrects its handling of rounding errors. It has two parts. One is an initialization that sets up arrays prob and alias from the probability array $p$. These are used in the function rand, which returns a random variable in the range $[0,n-1]$, whose probabilities are specified in p of length n. The call to random returns a variable uniformly distributed in $[0, 1)$.

There are several loops in the program.The first assigns integers in the range $[0,n-1]$ to one of two explicit arrays large and small. Those in small are the $i$ such that $p_{i} \le 1/n$. As the program proceeds, the integers in these arrays are those whose boxes have not yet been filled. Implicit is a third subset of $[0,n-1]$, which I’ll call finished. This contains all those indices for which no further processing is necessary—i. e. whose box is filled.

In the loop [0] the two arrays small and large are initialized, and the subset finished is left empty. In every run through this loop, an index is taken from small, its square is filled, and it is added to FIN. This happens by removing filling material form one of the boxes in large, which therefore becomes smaller. It is added to either small or large, according to how much is left. In each of these loops, the total size of large and small is decremented.

def init(p):
    l = 0
    s = 0
    [0] for i in range(n):
        if p[i] > 1/n:
            large[l] = i
            l += 1
            small[s] = i
            s += 1
    [1] while s > 0 and l > 0:
        s -= 1
        j = small[s]
        l -= 1
        k = large[l]
        prob[j] = n*p[j]
        alias[j] = k
        p[k] += (p[j]-b)
        if p[k] > b:
            large[l] = k
            l += 1
            small[s] = k
            s += 1
    [2] while s > 0:
        s -= 1
        prob[small[s]] = 1
    [3] while l > 0:
        l -= 1
        prob[large[l]] = 1

def p_random():
    x = n*random(0, 1)
    m = floor(x)
    if (x - m) < prob[m]: return m
    else: return alias[m]

The last loops [2] and [3] of Vose’s code are necessary to deal with rounding errors, which I include without further comment.

Here is a typical run for a Poisson process with mean $\mu = 4$.

The simulation

Let’s see how to use Walker’s method to simulate how a person just infected goes on to infect others. Suppose that he starts to be infectious on the fifth day, and that the probability that he infects a contact is specified in the following table:

$$ \matrix { i = \hbox{day after infection} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & \ge 10 \cr r_{i} = \hbox{probability of infection} & 0 & 0 & 0 & 0 & 0.1 & 0.3 & 0.4 & 0.4 & 0.2 & 0 \cr } $$

Suppose also that he makes and average of $4$ close contacts per day, and that these follow a Poisson distribution. Applying Walker’s algorithm, we get a sample run of contacts like this:

$$ \matrix { i = \hbox{day after infection} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\ \dots \cr c_{i} = \hbox{number of contacts} & 5 & 2 & 3 & 4 & 3 & 3 & 2 & 1 & 3 & 3\ \dots \cr } $$

In this run, how many people does he infect? There is no unique answer to this question. It depends on something like a coin toss at each contact. What we can calculate is an average. On the fifth day he infects an average of $0.1 \cdot 3$ people, on the sixth … etc. All in all:

$$ \hbox{average number of infections} = {\sum} c_{i} r_{i} = 0.1 \cdot 3 + 0.3 \cdot 3 + 0.4 \cdot 2 + 0.4 \cdot 1 + 0.2 \cdot 3 = 3.0 \, . $$

This is much lower than the average number of people he infects, which is called the $R_{0}$ value for this example. Here it is $(0.1 + 0.3 + 0.4 + 0.4 + 0.2) \cdot 4 = 5.6$.

Reading further

Am I really uninfected?

COVID-19 and rapid testing

What’s new is the appearance of a large number of rapid tests, for both professional and home use. They are relatively inexpensive, more convenient to administer, and capable of returning results quickly…

Bill Casselman
University of British Columbia, Vancouver, Canada

I discussed the accuracy of tests for COVID-19 in an earlier column, but I am going to take up the same topic in this one. What’s new is the appearance of a large number of rapid tests, for both professional and home use. They are relatively inexpensive, more convenient to administer, and capable of returning results quickly. A number of governments now use them to screen health care workers and travelers regularly. Considering how much government policies now depend on them, it is natural to wonder, how accurate are they? To what extent should they be relied on? These are especially important questions, given the natural urge to want certainty rather than nuanced probabilistic judgements.

There are basically two kinds of diagnostic tests for COVID-19. (I recall that a diagnostic test is one that detects whether the patient is currently infected, as opposed to those tests that detect whether the patient has been infected in the past.) These differ in which of the two viral components, genetic or structural, that they measure.

The genetic component of the coronavirus is made up of RNA, and quantities in a sample can be amplified by standard cloning techniques (in a process known as Polymerase Chain Reaction) so that even very small amounts of viral material become evident. Tests using these techniques are called PCR tests. They are extremely accurate, but processing the samples as well as the amplification process unavoidably take time, generally about 24 hours, and more with heavy work loads. Another problem with these is that samples have to be handled very carefully, and thus results are reliable only if the tests are given under strictly controlled conditions.

The structural component is protein, and is detected much more quickly by the same type of lateral flow test that is used in medical tests to detect other proteins, for example those that signify pregnancy. These are called antigen tests, because they rely on essentially the same protein detection techniques that the human body does to recognize foreign matter. The antigen tests have two advantages. They do not have to be given under medical supervision, and they give answers quickly—sometimes within several minutes. But they are less accurate.

A rapid antigen test for COVID-19 (Image by Falco Ermert, CC 2.0)

How accurate?

The tests for COVID have features in common with all medical screening procedures. The point is to tell whether the patient is afflicted with a certain ailment, and the answer is (almost always) a flat “yes” or “no”. Of course you can easily imagine this to be simplistic, but anything more subtle is probably impractical. The real problem is that no medical test is completely accurate. In most COVID tests a swab of the patient’s nasal passage or throat is taken, and then analyzed. These swabs are not reputed to be a pleasant experience, so it presumably takes some practice to get it right. In the case of COVID, the accuracy of the test depends very much on what point in the infection cycle the test is made. COVID has one really nasty feature which is probably largely responsible for rapid spread—several days often pass in which a person is does not show any symptoms and is yet highly infectious. In fact, some people never show symptoms but nevertheless pass through a highly infectious period. There does not seem to be a completely satisfactory way of dealing with this. Thus, even with fairly good tests, masks and social distancing are extremely important in preventing the spread of COVID.

There are two numbers that are used to characterize the accuracy of a medical testing procedure: sensitivity and specificity. The first tells you the proportion of people who are infected and for whom the test detects it. The second tells you the percentage of those who are not infected and whose test detects that. Ideally, a test would have both sensitivity 100% and specificity 100%, but that doesn’t happen. The PCR tests, at least, are very sensitive—up to 98% for some, at least in a clinical environment. But a clinical environment is necessary for these in any case. They are so sensitive, in fact, that they often detect very small fragments of viral RNA that come from an ancient infection rather a current one. This has the effect of lowering their specificity.

Generally, almost all tests have high specificity, because it is easy to say definitely that there is viral material in a sample—this means basically that the virus is distinct from other viruses, and confusion is unlikely. Sensitivity is more difficult to achieve. There seem to be two major causes of error: $\bullet$ the test was taken in the wrong part of the infection cycle, or $\bullet$ the swab might have been taken poorly.

On the other hand, high sensitivity is more important, because infected people who avoid detection (false negatives) are potential sources of further infection. High specificity is not so crucial, since tagging uninfected people as infected (making false positives) will not ordinarily have fatal consequences. Besides, a positive test result is usually checked by further tests.

The claimed sensitivity of the new rapid tests varies widely, from $70\%$ up, but it is not so clear under what circumstances these things have been measured. Have the tests destined for home use been tested in completely realistic circumstances? Can the sensitivity in practice be as low as $50\%$?

Significance of a test

Let’s look at a typical problem. Suppose you take one of the rapid tests, and it says you are not infected. How relieved should you be? I’ll rephrase this. Given a negative test result, what is the probability that you are actually not infected?

If you think about it, the meaning of the question might not be clear. On the one hand, whether or not you are infected by COVID depends on a series of random events—more particularly whom you have recently interacted with, in what circumstances, and whether any of them were infectious. So, in principle at least, it makes sense to talk about the probability that you are infected. But it is doubtful that you have even the least chance of calculating it. Furthermore, this probability is in some sense intrinsic to the pattern of your recent behaviour, It is not affected by the results of a test; it is the same before and after you take the test. So I ask, how do we interpret the question?

In practical terms, we can speak only of estimating this probability, and we can do this only by placing you, in imagination, in an environment of people whose exposure is similar to yours. This will involve geography, for example—at the time I write this, the infection rate in the state of Rhode Island is more than twenty times that in Minnesota. But it might include circumstances in which you have recently decreased dramatically your personal interactions with other people. In other words, if we want to look at things statistically, which is inevitable, we want to decide the proportion of people ‘like you’ who are infected. The problem is to interpret the phrase ‘like you’. I’ll therefore rephrase my question: how has the negative result of your test changed our estimate of the probability that you are infected? What is the effect of the new information on how we see your environment?

Let’s look at an example. We need to know three things before we calculate: (i) the estimated probability $x$ before you took the test, (ii) the sensitivity $s$ of the test, (iii) its specificity $t$. In the example, I’ll take $x = 0.03$, $s = 0.70$, and $t = 0.99$. And I’ll assume that the test is given to a population of $100,000$.

In the population, $0.03 \cdot 100,000 = 3,000$ will be infected, and $97,000$ will be uninfected. Of those who are infected, $0.70\cdot 3000 = 2100$ will test positive, and $900$ will not. (This test really isn’t very sensitive.) Of those who are uninfected, $970$ will test positive, and $96,030$ will test negative. We can put this in a diagram:

There are a couple of noteworthy features here. One is that even though the specificity is quite good, there are a lot of false positives. This is because the number of uninfected is high enough to compensate. The other feature offers an answer to our question. Before the test, we have no reason to classify anybody as special, and the probability of infection is $3\%$. But after the test someone who tests negative is one of the $900 + 96,030 = 96,930$ who tested negative, and the probability of being infected is $900/96,930 = 0.93\%$. It has dropped by a factor of slightly more than $3$.

I want now to find some formulas to replace this numerical computation. Suppose we start with a population of size $N$, a probability $x$ of being infected, and a test with sensitivity $s$ and specificity $t$. Among the population, $N x$ are infected and $N(1-x)$ are not. Of those who are infected, $Nxs$ will test $+$, and $Nx(1-s)$ will test $-$. Of those who are not infected, $N(1-x)(1-t)$ will test $-$ and $N(1-x)t$ will test $+$.

Let’s again ask, what does it mean that you test negative? The number of people who test negative is $Nx(1-s) + N(1-x)t$, and among those $Nx(1-s)$ are in fact infected. So even after a negative test result, your probability of being infected is

$$ { Nx(1-s) \over Nx(1-s) + N(1-x)t } = { x(1-s) \over x(1-s) + (1-x)t } = x \cdot { 1—s \over x(1-s) + (1-x)t } \, . $$

Now the specificity $t$ is very close to $1$, so this is very well approximated by

$$ x \cdot { 1—s \over (x—sx) + (1 -x) } = x \cdot { 1—s \over 1—sx } \, . $$

In the following image I display graphs of the factor $(1-s)/(1-sx)$.

For realistic (small) values of $x$ this factor is very close to $1-s$. In the example we looked at first, $s=0.7$, and this factor is about $0.3$. Sure enough, we found that the probability of infection dropped from $0.03$ to $0.0092 \sim 0.03\cdot(1-0.7)$. The improvement is significant, but not spectacular. The amount of relief one can get from a negative test, even a fairly good one, looks definitely limited.

The values of sensitivity $s$ and specificity $t$ are provided by the test manufacturer, so in order to apply these formulas one has to figure out what $x$ is. The CDC calls this the pretest probability, and advises:

To help estimate pretest probability, CDC recommends that laboratory and testing professionals who perform antigen testing determine infection prevalence based on a rolling average of the positivity rate of their own SARS-CoV-2 testing over the previous 7–10 days. If a specific testing site, such as a nursing home, has a test positivity rate near zero, the prevalence of disease in the community (e.g., cases among the population) should instead be used to help determine pretest probability. State health departments generally publish COVID-19 data on testing positivity rates and case rates for their communities.

You might take “your community” to be the state you live in. The probability varies enormously. Here are some examples in the low and high ranges (based on data taken December 27 from Worldometer):

$$ \matrix { \hbox{Minnesota} & 0.0032 \cr \hbox{Wyoming} & 0.0035 \cr \hbox{Wisconsin} & 0.0055 \cr \hbox{Arizona} & 0.0561 \cr \hbox{Montana} & 0.0691 \cr \hbox{Rhode Island} & 0.0711 \cr } $$

Final remarks

The rapid tests are not absolutely accurate, but are nonetheless a valuable tool in controlling COVID-19. Catching even a fraction of contagious people presumably has a dramatic and disproportionate effect on the spread of the disease. It would be valuable to have a clear explanation of how the use of rapid tests affects models of the pandemic. But it is important to realize that a negative result is not by itself a certificate that someone is in fact not contagious. This is especially important if there are other reasons to suspect infection.

Reading further

AMS Feature Column banner

Sensitivity, Specificity, and COVID-19 Testing

As every doctor should know, medical tests are rarely 100% accurate. Interpreting them is a matter of probability. And probability, of course, is a matter of mathematics…

Bill Casselman
University of British Columbia, Vancouver, Canada

Well, make up your mind. Does he have it or not?

I remarked in an earlier column that there were two major ways in which mathematics has contributed to our understanding of the disease COVID-19 and the coronavirus SARS-CoV-2 that causes it. One was by modeling the development of the epidemic in order to predict possible outcomes, and the other was by uncovering the sources of particular outbreaks by tracking the evolution of genomes. This column will be concerned with a third way in which mathematics helps to control the epidemic, and that is by interpreting procedures that test for the presence of the disease.

The basic problem was illustrated dramatically at the beginning of August. The President of the United States was about to visit the state of Ohio, and naturally it was expected that he and the governor of Ohio (Mike DeWine, a Republican) would meet. As a matter of routine, the governor was tested for coronavirus before the meeting, and the test returned a positive result. It was a definite possibility, even likely, that DeWine was afflicted with Covid-19. He put himself into quarantine immediately, and of course the meeting with the President had to be called off. A short time later a second, presumably more accurate, test was given and this time it came back negative. This second result was later confirmed by third and fourth tests. (You can read more about this incident in an article in The New York Times, or also another NYT article.)

It was suggested in the media (the Washington Post, among others) that this sequence of events would reduce the trust of the population at large in the validity of such tests, and—very, very unfortunately—discourage many from having them. The governor himself, one of the state governors to take the epidemic extremely seriously, stated very clearly that in his opinion avoiding tests was not at all a good idea. Medical experts agree—testing is in fact a crucial tool in controlling the epidemic, and the United States is almost certainly not doing enough of it. At the very least it is doing it awkwardly. It is therefore important to try to understand better what is going on.

The basic fact is very simple: as every doctor should know, medical tests are rarely 100% accurate. Interpreting them is a matter of probability. And probability, of course, is a matter of mathematics.

images of test tubes
COVID-19 test image from Wikimedia Commons.

Measuring a test’s accuracy

Many medical tests are abstractly similar: they give a yes/no answer to a question about the status of a patient. In the case of the governor of Ohio, the question is, “Does this person have COVID-19?” His first test answered “yes”, while the later three answered “no”. Clearly, they were not all correct. The tests are certainly inaccurate a certain percentage of the time.

How is the accuracy of such tests measured? There is no single number that does this. Instead, there are two: sensitivity and specificity. To quote one instructive web page, sensitivity and specificity “are the yin and yang of the testing world and convey critical information about what a test can and cannot tell us. Both are needed to fully understand a test’s strengths as well as its shortcomings.”

Sensitivity characterizes how a test deals with people who are infected. It is the percentage of those who test as infected when that they are in fact infected. A test with high sensitivity will catch almost every infected person, and will hence allow few of those infected to escape detection. This is the most important criterion. For example, it was reported recently that Spanish health authorities returned thousands of tests to one firm after finding that its tests had sensitivity equal to 30%. This means that 70% of infected people would not be diagnosed correctly by this test.

Specificity characterizes how a test deals with people who are not infected. It is the percentage of those who will test as uninfected when they are in fact uninfected. A test with high specificity will hence report a small number of false positives.

For detecting currently infected people, high sensitivity is important, because infected people who avoid detection (false negatives) are potential sources of further infection. High specificity is not so crucial. Tagging uninfected as infected (making false positives) can be unfortunate, but not with possibly fatal consequences.

It might not be immediately apparent, but sensitivity and specificity are independent features. For example, it could happen that a test lazily returns a “yes” no matter what the condition of the patient happens to be. In this case, it will have sensitivity 100%, but specificity 0%. Or it could behave equally lazily, but in the opposite way, and return a “no” in all cases: sensitivity 0% and specificity 100%. Both of these, would be extremely poor tests, but these simple examples demonstrate that the two measures are in some way complementary. (Hence the yin and yang.) We shall see later on another illustration of complementarity.

For COVID-19, the most commonly used tests are of two very different kinds – one attempts to tell whether the tested patient is currently infected, while the second detects antibodies against COVID-19. Tests in this second group do not detect whether the patient is currently infected, but only whether or not the patient has been infected in the past.

In the first group of tests, which detect current infection, there is again a division into two types. One, called a PCR test, looks for genetic fragments of the virus. The other tries to find bits of the virus’ protein component. This last is called an antigen test, because it is this protein component that a body’s immune system interacts with. (An antigen, according to Wikipedia, is a foreign substance invading the body and stimulating a response of the immune system.) Tests of the first type are generally more accurate than those of the second, but tests of the second type produce results much more rapidly – in minutes rather than hours or even days.

None of these tests is guaranteed to return correct results.

The first test taken by Governor DeWine was an antigen test, and his later tests were all PCR. In all these tests, samples were taken by nasal swabs. The antigen test was relatively new, and produced by the company Quidel. Initial versions were claimed to possess a sensitivity of around 80%, but more recent ones are claimed to have about 97% sensitivity, which is certainly comparable with PCR tests. They also claimed from the start a specificity of 100%.

The outcome of tests

What do these numbers mean? Given that Governor DeWine tested positive, what is the probability that he is infected with COVID-19?

One way to figure this out is to run some mental experiments. Suppose we give the tests to 1000 people chosen at random from a US population. What happens? It turns out that the answer to this depends on the infection level of the population, as we shall see. In other words, in order to carry out this experiment, I have to assume something about this level. Like many other things about COVID-19, this statistic is not known accurately, and in any case varies widely from place to place—perhaps, at this stage of the epidemic, a plausible range is between 1% and 15%. I’ll try various guesses, using the values of sensitivity and specificity given by Quidel.

1. Suppose at first 5% of the population to be infected.

  • In the sample of 1000, there will be around 50 who are currently infected.
  • Because the sensitivity of the test is 97, of these about 48 will be labeled as positive, and the remaining 2 will not be correctly detected.
  • But there remain 950 people in the sample who are not infected. For these it is the specificity that matters. I have assumed this to be 100%, which means that 100% of those who are not infected will be labeled as uninfected, and 0% will be found infected.
  • Therefore, with these values of sensitivity and specificity, a nominal 48 will be found to be infected, and $850 + 2$ uninfected.

Wait a minute! Governor DeWine turned out ultimately not to be infected, but he was nonetheless found to be infected by the test. This contradicts the claim of 100% specificity! Something has gone wrong.

It is important to realize that the assigned values of sensitivity and specificity are somewhat theoretical, valid only if the test is applied in ideal conditions. Often in the literature these are called clinical data. But in the real world there are no ideal conditions, and the clinical specifications are not necessarily correct. Many errors are related to the initial sample taking, and indeed it is easy enough to imagine how a swab could miss crucial material. However, it is hard to see how an error could turn a sample without any indication of COVID-19 to one with such an indication. In any case, following the inevitable Murphy’s Law, errors will certainly degrade performance.

2. This suggests trying out some slightly less favorable values for sensitivity and specificity, say 90% sensitivity and 95% specificity, but with the same 5% infection rate.

  • In this case, 45 out of the true 50 infected will be caught, and 5 who will not be tagged.
  • There will still be 950 who are not infected, but 5% = (100 – 95)% of these, i.e. about 48, will return positive.
  • That makes another 48, and a total of 93 positive test results.
  • In this experiment, Governor DeWine is one of 93, of whom 45 are infected, 48 not. Given the data in hand, it makes sense to say that the probability he is one of the infected is 45/93 = 0.49 or 49%. Definitely not to be ignored.

3. We might also try different guesses of the proportion of infection at large. Suppose it to be as low as 1%.

  • Then of our 1000, 10 will be infected. Of these, 95% = 9 will test positive.
  • In addition, there will be 990 who are not infected, and 5% or about 49 of these will test as positive, making a total of 58.
  • Now the probability that the Governor is infected is 9/58 = 15%, much lower than before. So in this case, when the proportion of the overall population who are infected is rather small, the test is swamped by false positives.

4. Suppose the proportion of infected at large to be as high as 20%.

  • Then of our 1000, 200 will be infected. Of these, 95% = 180 will test positive.
  • In addition, there will be 800 who are not infected, and 5% or about 40 of these will test as positive, making a total of 220.
  • Now the probability that the Governor is infected would be 180/220 = 82%, much higher than before. Looking at these three examples it seems that the test becomes more accurate as the overall probability of infection increases.

5. To get a better idea of how things go without rehearsing the same argument over and over, we need a formula. We can follow the reasoning in these calculations to find it. Let $a$ be the sensitivity (with $0 \le a \le 1$ instead of being measured in a percentage), and let $b$ be the specificity. Suppose the test is given to $N$ people, and suppose $P$ of them to be infected.

  • Then $aP$ of these will be infected and test positive.
  • Similarly, $(1-a)P$ will be infected but test negative.
  • There are $N – P$ who are not infected, and of these the tests of $b(N-P)$ will return negative.
  • That also means that the remainder of the $N-P$ uninfected people, or $(1-b)(N-P)$, will test positive (these are the false positives).
  • That makes $aP + (1-b)(N-P)$ in total who test positive. Of these, the fraction who are infected, which we can interpret as the probability that a given person who tests positive is actually one of the infected is $$ { aP \over aP + (1 – b)(N-P) } $$
  • We can express this formula in terms of probabilities instead of population sizes. The ratio $p = P/N$ is the proportion of infected in the general population. The ratio $q = (N-P)/N$ is the proportion of uninfected. Then the ratio in the formula above is $$ { ap \over ap + (1-b)q } $$

The advantage of the formula is that we don’t have to consider each new value of $P/N$ on its own. Given $a$ and $b$, we can graph the formula as a function of the proportion $p = P/N$ of infected. For $a = 0.90$, $b = 0.95$ we get:


graph of proportion infected vs. probability a positive sample is infected

A good test is one for which the number of those who test positive is close to the number of those who are infected. The graph shows that this fails when the proportion of infected in the population at large is small. As I said earlier, the test is swamped by false positives.

Likewise, the proportion of infected who test negative (the potentially dangerous false negatives) is

$$ { (1-a)p \over (1 – a)p + bq } $$

And in a graph:


graph of proportion infected vs. probability a negative sample is infected

This time, the test is better if the graph lies low—if the number of infected who escape is small. The two graphs again illustrate that sensitivity and specificity are complementary. As the proportion infected in the population at large goes up, one part of the test improves and the other degrades.

The moral is this: sensitivity and specificity are intrinsic features of the tests, but interpreting them depends strongly on the environment.

Repeating the tests

As the case of Governor DeWine showed, the way to get around unsatisfactory test results is to test again.

To explain succinctly how this works, I first formulate things in a slightly abstract manner. Suppose we apply a test with sensitivity $a$ and specificity $b$ to a population with a proportion of $p$ infected and $q = 1-p$ uninfected. We can interpret these as probabilities: $p$ is the probability in the population at large of being infected, $q$ that of not being infected.

The test has the effect of dividing the population into two groups, those who test positive and those who test negative. Since the tests are not perfect, each of those again should be again partitioned into two smaller groups, the infected and the uninfected. Those who test positive have [infected:uninfected[] equal to $[ap : (1-b)q]$.

Those who test negative have [infected: uninfected] equal to $[(1-a)p:bq]$.

So the effect of the test is that we have divided the original population into two new populations, each of which also has both infected and infected. For those who tested positive, the new values of $p$ and $q$ are

$$ p_{+} = { ap \over ap + (1-b)q }, \quad q_{+} = { (1-b)q \over ap + (1-b)q } , $$

while for those who tested negative they are

$$ p_{-} = { (1-a)p \over (1-a)p + bq }, \quad q_{-} = { bq \over (1-a)p + bq }. $$

These are the new infection data we feed into any subsequent test, possibly with new $a$, $b$.

Now we can interpret Governor DeWine’s experience. The first test was with the Quidel antigen test, say with $a = 0.90$ and $b = 0.95$. Suppose $p = 0.05$. Since the test was positive, the probability of the Governor being infected is therefore 0.49. But now he (and, in our imaginations, his cohort of those who tests were positive) is tested again, this time by the more accurate PCR test, with approximate accuracy $a = 0.80$ and $b = 0.99$. This produces a new probability $p = 0.16$ that he is infected, still not so good. But we repeat, getting successive values of $0.037$ and then $ 0.008$ that he is infected. The numbers are getting smaller, and the trend is plain. Governor DeWine has escaped (this time).


Estimating the number of infected

In effect, the test is a filter, dividing the original population into two smaller ones. One is an approximation to the set of infected, the other an approximation to the set of uninfected.

A second use of these tests is to try to figure out what proportion of the population is in fact infected. This is especially important with COVID-19, because many of the cases show no symptoms at all. The basic idea is pretty simple, and can be best explained by an example.

In fact, let’s go back to an earlier example, with $N = 1000$, sensitivity $0.90$, specificity $0.95$, 150 infected. Of these, 177 test positive. We run the same test on these, and get 123 testing positive. Repeat: we get 109 positives.

What can we do with these numbers? The positives are comprised of true positives and false positives. The number of true positives is cut down by a factor of $0.90$ in each test, so inevitably the number of true positives is going to decrease as tests go on. But they decrease by a known ratio $a$! After one test, by a ratio $0.90$, after two tests, a ratio $0.90^{2} = 0.81$, after three by $0.90^{3} = 0.729$. Taking this into account, in order to find an approximation to the original number of infected we divide the number of positives by this ratio. We get the following table:

$$ \matrix { \hbox{ test number } & \hbox{ number of positives $N_{+}$ } & \hbox{ ratio $r$ } & N_{+}/r \cr 1 & 177 & 0.9 & 197 \cr 2 & 124 & 0.81 & 152 \cr 3 & 109 & 0.729 & 150 \cr } $$

Already, just running the test twice gives a pretty good guess for the original number infected.

One of my colleagues pointed out to me a simple way to visualize what’s going on. Suppose that we are looking at a population of 1000, 20% of whom are infected:


Square with left portion colored red for infected

Then we run a test on it, with 20% sensitivity and 90% specificity.


Square showing infected patients and test results

The regions at the top are those whose tests are false.

Reading Further

AMS Feature Column banner

Mathematics and the Family Tree of SARS-Cov-2

It's a remarkable process and, if it weren't so dangerous to humanity, would be deserving of admiration. …

Bill Casselman
University of British Columbia, Vancouver, Canada


There are two major ways in which mathematics has contributed to our understanding of the disease CoVid-19 and the coronavirus SARS-Cov-2 that causes it. One that is very prominent is through mathematical modelling, which attempts to predict the development of the epidemic in various circumstances. In earlier Feature Columns I wrote about modelling measles epidemics, and much of what I wrote there remains valid in the new environment. With the appearance of the extremely dangerous CoVid-19, mathematical modelling has become even more important, but the complexity of this work has also become apparent, and maybe not suitable for discussion here.

Another relevant application of mathematics attempts to track the evolution of the virus responsible. The core of every virus is a small string of replicating genetic material, either DNA or RNA, which sits enclosed in a structure of some kind. The structure serves to keep the virus in one piece, but also, and most crucially, facilitates the insertion of the genetic material into target cells. (It is in this respect that SARS-Cov-2 outdoes SARS-Cov-1.) The genetic material then takes over the machinery of the invaded cell in order to reproduce itself. It's a remarkable process and, if it weren't so dangerous to humanity, would be deserving of admiration.

In larger animals, the basic genetic material amounts to double helices made up of two complementary strands of nucleotides—i.e., made up of the four nucleotides A (adenine), G (guanine), C (cytosine), and T (thymine). In the virus that produces CoVid-19 the genetic material is RNA, which in the larger cells plays little role in reproduction, but is involved in translating genetic instructions to the construction of proteins. It is a single strand of about 30,000 nucleotides, in which T is replaced by U (uracil). I'll ignore this distinction.

There is some variation in these strands among different populations of viruses, because strands that differ slightly can often still work well. The variation is caused by random mutations, which substitute one nucleotide for another, and a few other possibly damaging events such as deletions and insertions of genetic material. I do not know what, exactly, causes these, but some viruses are much more prone than others to mutate. For example, what makes the common influenza viruses so difficult to control is that they mutate rapidly, particularly in ways that make it more difficult to recognize them by antibodies from one year to the next. It's easy to believe that this "strategy" is itself the product of natural selection.

The good news is that coronaviruses do not apparently mutate rapidly. They do mutate enough, however, that RNA sequences of viruses from different parts of the world may be interpreted as a kind of geographical fingerprint. Keeping track of these different populations is important in understanding the spread of the disease. Similar techniques are also used to try to understand how SARS-Cov-2 originated in wild animal populations.

Where's the mathematics?

I'll try to give some idea of how things work by looking at a very small imaginary animal, with just four nucleotides in its genome. Every generation, it divides in two. In the course of time each of the progeny can suffer mutations at one site of the genome, and eventually each of these in turn divides. The process can be pictured in a structure that mathematicians call a tree. In the following example, which displays the process through three generations, time is the vertical axis. The animal starts off with genome ACGT. The colored edges mark mutations.


A structure like this is called a rooted tree by mathematicians and biologists. It is a special kind of graph.

And now I can formulate the basic problem of phylogenetics: Suppose we are given just the genetics of the current population. Can we deduce the likely history from only these data? The answer is that we cannot, but that there are tools that will at least approximate the history. They are all based on constructing a graph much like the one above in which genomes that are similar to each other are located close to one another. But how to measure similarity? How to translate a measure of similarity to a graph? You can already see one difficulty in the example above. The genome ACCT occurs in three of the final product, but although two of these are related in the sense that they are common descendants and therefore tied to the history, one is the consequence of independent mutations. It is difficult to see how this could be captured by any analysis of the final population.

It is this problem that the website NextStrain is devoted to. It is most distinguished for its wonderful graphical interpretation of the evolution of different populations of SARS-Cov-2. But, as far as I can see, it tells you next to nothing about underlying theory.

In the rest of this column, I'll try to give some rough idea of how things go. But before I continue, I want to come back to an earlier remark, and modify the basic problem a bit. I am no longer going to attempt to reconstruct the exact history. In reality, this is never the right question, anyway—for one thing, no process is a simple as the one described above. Reproduction is irregular, and one generation does not occupy steps of a fixed amount of time. Some organisms simply die without reproduction. Instead, we are going to interpret a tree as a graphical way to describe similarity, with the point being that the more similar two genomes are, the closer they are likely to be in our tree, and the more recent was their common ancestor. In particular, not all branches in our tree will be of the same length. The tree


illustrates only that A and B are more closely related than A and C or B and C. The actual lengths of edges in a tree will not be important, or even their orientation, as long as we know which node is the root. It is just its topological structure that matters.

Listing trees

All the phylogenetic methods I know of deal with trees that are rooted and binary. This means that (i) they all start off from a single node at the bottom and go up; (ii) whenever they branch, they do so in pairs. The endpoints of branches are called (naturally) leaves. We shall also be concerned with labeled trees in which a label of some kind is attached to every leaf. The labels we are really interested in are genomes, but if we number the genomes we might as well use integers as labels.

In order to understand some of the methods used to assemble relationship trees, we should know how to classify rooted binary trees (both unlabeled and labeled) with a given number of leaves.

There are two types of classification involved. First one lists all rooted binary trees of a given shape, and then one looks at how to label them. For a small number of leaves both these steps are very simple, if slightly tedious.

Two leaves. They all look like the figure on the left:


And there is essentially one way to label them, shown on the right. What do I mean, "essentially"? I'll call two trees essentially the same if there exists a transformation of the one of them into the other that takes edges to edges and the root to the root. I'll call two labelings the same if such a transformation changes one into the other. In the tree above the transformation just swaps branches.

Three leaves. There is only essentially one unlabeled tree. To start with, we plant a root. Going up from this are two branches. One of them must be a leaf, and the other must branch off into two leaves. So the tree looks essentially like this:


As for labelings, there are three. One must first choose a label for the isolated leaf, then the others are `essentially' determined:


(There are three other possible labelings, but each yields a tree essentially the same as one of those above.)

Four leaves. There are two types of unlabeled trees, one with branching to an isolated leaf and an unlabeled tree of three leaves, the other with two branches of two leaves each.


There are 15 labelings. There is a systematic way to list all rooted trees with $n$ leaves, given a list of all those with $n-1$, but I won't say anything about it, other than to hint at it in the following diagram:


I will say, however, that there are 105 labeled trees with 5 leaves, and that in general, there are $1\cdot 3 \cdot \, \ldots \, \cdot (2n-3)$ labeled trees with $n$ leaves. (I refer for this to Chapter 2 of the lecture notes by Allman and Rhodes.) This number grows extremely rapidly. It is feasible to make lists for small values of $n$, but not large ones. This impossibility plays a role in reconstructing the phylogeny of a large set of viruses.


Reconstructing the entire history in practical cases is for all practical purposes impossible. What one attempts is just to find an approximation to it. The structure of the candidate graph should reflect at least how close the items are to each other. As explained in the lecture notes by Allman and Rhodes, there are many different ways to measure closeness, and many different techniques of reconstruction. None is perfect, and indeed there is no perfect method possible.

The techniques come in two basic flavours. Some construct a candidate tree directly from the data, while others search through a set of trees looking for the one that is best, according to various criteria. The ones that construct the tree directly don't seem to do well, and the methods most often used examine a lot of trees to find good ones. I'll look at the one that Allman and Rhodes call parsimony. It is not actually very good, but it is easy to explain, and gives some idea of what happens in general. It tries to find the tree that is optimal in a very simple sense—it minimizes, roughly, the total number of mutations in terms of the connections of the tree. (The term, like many in modern biology, has taken on a technical meaning close to, but not quite the same as, that in common usage. According to Wikipedia, parsimony refers to the quality of economy or frugality in the use of resources.)

In this method, one scans through a number of rooted trees whose labels are the given genomes, and for each one determines a quantitative measure (which I'll call its defect) of how closely the genomes relate to each other. One then picks out that tree with the minimum defect as the one we want. In fact, in this method as in all of this type there may be several, more or less equivalent.

How is the defect of a candidate labeled tree computed? I'll illustrate how this goes for only one set of genomes and one tree:


Step 1. The defect of the tree is the sum of defects of each of its nucleotides (in our case, four). This is done by calculating at every node of the tree a certain subset of nucleotides and a certain defect, according to this rule: (i) if the node is a leaf, assign it just the nucleotide in question, and defect 0. Progress from leaves down. At an internal node whose branches have already been dealt with, assign a subset — either (i) the intersection of the subsets just above it if it is non-empty, or (ii) otherwise, their union. In the first case, assign as defect the sum of defects just above, but in the second assign the sum plus 1. These four diagrams show what happens for our example.



The total defect for this labelled tree is therefore 4.

Step 2. In principle, you now want to scan through all the trees with $n$ leaves, and for each one, calculate its defect. There will be one or more trees with minimal defect, and these will be the candidates for the phylogeny. In practice, there will be too many trees to look at all, so you examine a small collection of sample trees and then look at a few of their neighbours to see if you can make the defect go down. The trick here, as with many similar methods is to choose samples well. I refer to Chapter 9 of Allman-Rhodes

Dealing with SARS-Cov-2

You can test a method for reconstructing an evolution by running some evolutionary processes and then applying the method to see how it compares. In practice, the structure parsimony produces are not generally satisfactory. In practice, it is the method "Maximum Likelihood" used most often. Allman-Rhodes say something about that method, too. As with every one of the methods so far developed, there are both theoretical and practical difficulties with it. But in most situations its results seem to be satisfactory.

With SARS-Cov-2, there are at least very satisfactory sources of data — GenBank and GISAID. I say something about these in the reference list to follow. Both of these websites offer more than thousands of genomes for SARS-Cov-2 as well as other organisms. These have been contributed by medical researchers from all over the world, and they are well documented.

Reading further



  • University of Washington software list

    To a mathematician, the sheer volume of publications on medical topics is astonishing. I list this site to illustrate this even for the relatively arcane topic of phylogeny. Incidentally, the original reason for developing phylogenetic tools was to track the evolutionary development of life on Earth, which was a matter of interest even before Darwin's theory of evolution appeared. For many years, it was based on morphological aspects rather than genomes. But now that gene sequencing is so simple, it has also become an important part of medical investigation.

AMS Feature Column banner

More Measles

In this column, I shall say more about modeling the progress of an epidemic. …

Bill Casselman
University of British Columbia, Vancouver, Canada


This is the second Feature Column in which I discuss the mathematics involved in the transmission of measles. The first one made a few very brief remarks about how a mathematical model might track the progress of an epidemic, and then discussed the phenomenon of herd immunity, which enables a population as a whole to become immune to the disease even though not all individuals are immune. In this column, I shall say more about the first topic–i. e. say more about modeling the progress of an epidemic.

As in the earlier column, my intention here is not to come even close to a detailed analysis, but just to explain basic phenomena.

What's to be explained?

In late September 2019 a single case of measles was reported on one of the islands of the South Pacific nation of Samoa. It was followed by several more, the number of cases building slowly at first but then eventually growing rapidly:



( Most data from the Twitter feed of the government of Samoa, and some transmitted by David Wu. )

What possible theory can account for such phenomena? For example, could one have predicted the eventual size of the epidemic from early data? Could one have predicted its timing?

The answer to the first question is almost certainly "no". This epidemic seems to be somewhat unusual. When one infectious person appears in a population that is largely vulnerable to a disease he will infect a certain number of individuals that he comes in contact with. Each of them will infect roughly the same number, and so on. So at the beginning of an epidemic one expects exponential growth. That is not evident here. (Why not?)

Infected people will eventually recover from a disease like measles, and they cannot subsequently be reinfected. As time goes on, the number of immune people will grow, and infected people will come into contact with fewer and fewer susceptible ones. The epidemic will slow down. If no preventive measures are taken, such as patient isolation or vaccination, nearly all the population will have been eventually infected. In the Samoa epidemic, a mandatory vaccination program was begun in mid-November, and after the new vaccinations took effect (a period of roughly 10 days), the graph begins visibly to flatten.

The parameters of a disease

There is a simple and very basic model for such epidemics. It divides a population into a small number of subsets, and keeps track of the sizes of these subsets from one day to the next, starting with an initial partition. The process depends on a very small number of parameters: (1) the basic reproduction number ${\bf R}_{0}$–the number of people infected by a single case appearing in an unprotected population; (2) the length of time $P$ from when a person is infected to when he in turn becomes infectious (the pre-infectious period); (2) the length of time $D$ that he is infectious.

For measles, ${\bf R}_{0}$ is about $13$, $P$ is about $8$, and $D$ is also about $8$.

It is very important to realize that these data are not exact, but generally vary over a small range according to circumstances. Underlying them is some kind of probability distribution, and they should definitely be interpreted as rough, even poorly known, averages. For example, in this model the degree of contagion remains constant for a period of $D$ days, while in reality a person may well be more contagious at certain times than others.

Of these parameters, it is ${\bf R}_{0}$ that is at once most subtle and most significant. The following table displays approximate lower and upper limits for the value of ${\bf R}_{0}$ associated to several diseases. Measles is very contagious.


The basic SEIR model

There is a standard basic model of epidemics of general diseases similar to measles, in which a case of the disease confers immunity (unlike, say, malaria or HIV). It is unrealistically simple, but nonetheless suggestive. Its principal value is probably that if a real epidemic differs from this model one will want to understand why.

In this model the population under consideration is partitioned into five states:

  • S: susceptible to infection
  • E: exposed (and infected) but not yet infectious
  • I: $\kern 1pt$ infectious
  • R: recovered
  • V: those who have been vaccinated or had the disease previous to the current epidemic

Those in categories $R$ and $V$ are immune. Those who have been vaccinated are effectively in group $R$, through a kind of virtual infection. One can assume $V$ is contained in $R$ if people are not vaccinated in the epidemic, by specifying $R(0)$ to be $V(0)$. In addition, it is valuable to keep track of the time elapsed (say, in days) since a person's last state transition took place. But I'll not do that here.

For some diseases, and in particular measles, there is a possible further category:

  • A: asymptotic (not yet showing symptoms) but infectious

For measles, the period in which this happens last for several days, and it makes measles especially dangerous. But although the distinction between $I$ and $A$ is important in practice, since it is only patients with symptoms who are quarantined, I'll ignore this distinction in what is to come.

I'll simplify things quite a bit, and keep track of the state of an epidemic in five numbers–the sizes of each of the relevant categories listed above. But already in this characterization the lack of reality will be apparent, because these numbers will be floating point approximations to integer values. And we'll keep track of states at fixed intervals of time $n \, dt$. In fact, I'll assume that $dt = 1$ day. So we are tracking numbers $$ S(t), \quad E(t), \quad I(t), \quad R(t), \quad V(t) $$ at times $t = 0$, $1$, $2$, … days after an initial case.

How does the state at time $n+1$ change from that at time $n$?


It turns out that this change of state will be fairly simple, so that tracking a model epidemic will require just specifying an initial state. For example, a single infectious person appearing in a susceptible population of size $N$ will have $$ S(0) = N – 1, \quad E(0) = 0, \quad I(0) = 1, \quad R(0) = 0, \quad V(0) = 0 \, . $$

In all examples I'll look at, I'll take $V(t)$ to be a constant $V_{0}$.

More about parameters

The principal parameter of an epidemic is its basic reproduction number ${\bf R}_{0}$. (This is conventional notation. It is not my fault that there is no relation between the $R$ of ${\bf R}_{0}$ and the $R$ of $R(t)$.) This is defined to be the average number of people directly infected by one person appearing in a large population in which everyone is susceptible. This is necessarily a somewhat theoretical notion, since in the modern world it is hard to find such populations, but it is nonetheless an important concept. The number ${\bf R}_{0}$ doesn't have to be an integer, since it is just an average. It is dimensionless, but it can be expressed in an illuminating fashion as a product of ratios that do have dimensions. Explicitly $$ {\bf R}_{0} = (\hbox{number of infections per contact}) \cdot (\hbox{number of contacts per unit of time}) \cdot (\hbox{amount of infectious time per infection}) \, . $$

It is useful to have this factorization, since it helps you keep track of assumptions going into our model, and with luck might suggest how to measure ${\bf R}_{0}$ when a new disease appears. For example, it should be clear that it depends in social structure, since the rate of contacts varies. In particular, one expects ${\bf R}_{0}$ to be higher than average in an epidemic spreading inside a school, with a lot of contact. In one study the value of ${\bf R}_{0}$ in such a case was estimated to be around $30$.

There is a variant of ${\bf R}_{0}$ that is useful to be aware of. An infectious person is capable of infecting ${\bf R}_{0}$ others in a totally susceptible population, but as an epidemic develops, more and more people become immune to the disease, and the number of susceptibles decreases. In this situation, if an infectious person has ${\bf R}_{0}$ potentially infecting contacts, only a fraction of these can in fact become infected. The fraction is $S(t)/N$. Therefore he infects in one day $({{\bf R}_{0}/ D })\cdot ({ S(t) / N })$, if $N$ is the population size. The effective or net infection number is therefore $$ {\bf R}_{\rm net} = {\bf R}_{0} \cdot { S(t) \over N } \, . $$

Change of state

(S) If there are $I(t)$ infectious individuals then the number of new cases in one day is $$ {{\bf R}_{\rm net} \over D } \cdot I(t) = {{\bf R}_{0} \over D }\cdot { S(t) \over N } \cdot I(t) \, . $$ Hence $$ S(t + 1) = S(t) – {{\bf R}_{0} \over D } \cdot { S(t) \over N } \cdot I(t) = S(t) – {{\bf R}_{\rm net} \over D } \cdot I(t) \, . $$ If $$ \lambda(t) = {{\bf R}_{0} \over D } \cdot { I(t) \over N } $$ then this becomes $$ S(t + 1) = S(t) – \lambda(t) S(t) \, . $$

(E) The number of people who are infected but not contagious at time $t$ is increased by the susceptibles who become infected, and is decreased by those who transition to an infectious state. Let $F$ be the rate of transition. If $P$ is as above the number of pre-infectious days, then $F = 1/P$. We have $$ E(t + 1) = E(t) + \lambda(t) S(t) – F E(t) \, . $$

(I) The number of people who are contagious is increased by those who transition from the previous state, and decreased by those who recover. Let $\Omega$ be the rate of recovery. If $D$ is as above the number of infectious days, then $\Omega = 1/D$ and $$ I(t + 1) = I(t) + F E(t) – \Omega I(t) \, . $$

(R) Finally, on the assumption that the size of the population remains constant: $$ R(t + 1) = R(t) + \Omega I(t) \, . $$


With $$ \lambda(t) = {{\bf R}_{0} \over D } \cdot { I(t) \over N } $$ We have $$ \eqalign { S_{t + 1} &= S(t) – \lambda(t) S(t) \cr E_{t + 1} &= E(t) + \lambda(t) S(t) – F E(t) \cr I_{t + 1} &= I(t) + F E(t) – \Omega I(t) \cr R_{t+1} &= R(t) + \Omega I(t) \, . \cr } $$ It is curious, and of some purely mathematical interest, that if I define $$ \eqalign { s(t) = { S(t) \over N } \cr e(t) = { E(t) \over N } \cr i(t) = { I(t) \over N } \cr r(t) = { R(t) \over N } \cr } $$ these equations become ones in which $N$ does not occur.

The presence of the factor $I(t)$ in $\lambda(t)$ makes this what a mathematician calls a non-linear system of equations–the various terms do not scale in a linear fashion. In any case, if one starts with known conditions, one can compute approximate values of all these variables for as many days as one wants. In fact, it is very easy to set up a spreadsheet to do this once ${\bf R}_{0}$, $P$, and $D$ are known. To simulate an epidemic, choose some initial values for $S(0)$, $I(0)$, $E(0)$, and $R(0)$ as well as $V_{0}$ and then compute in steps the values of the variables for all variables.


The first thing one probably wants to do with the technique outlined above is to run a few examples in order to get some feel for what happens.

The following figure illustrates the progress over 100 days of a simulated measles epidemic (${\bf R}_{0} = 13$, $P = 8$, $D = 7$) in in a totally susceptible population ($A(t) \equiv 0$) of 1000 people, starting out from the introduction of one case. Qualitatively, it looks very reasonable. Note the lag between infection and infectiousness.



The next figure illustrates what happens when 50% of the the initial population is immune at the start. It looks like in the very long run, everybody will be either initially immune or will have had measles, but that is not the case–instead the number of susceptibles has a non-vanishing limit that will be extremely small if ${\bf R}_{0}$ is large. But for ${\bf R}_{0} = 2$, this part of the population will be about 20%.



In the next example, I take 95% of the population to be initially immune. In this case, the initial immunity is high enough to lead to herd immunity–that is to say, a definite percentage of the population is still susceptible. The initial infection dies out very quickly. Those who are susceptible at the beginning, for example those who cannot be vaccinated, remain susceptible without damage. This is true of young children, for whom the recommended age for vaccination, for various reasons, is about 15 months. Vaccination thus becomes a civic duty, not just a wise personal choice.


The Samoa measles epidemic of 2019

Early in 2019, as explained in a BBC news item, an epidemic of measles broke out in New Zealand. There is a fair amount of traffic between New Zealand and the small island nation of Samoa (population 196,000), partly because of the large number of Samoans working in New Zealand, and it is likely that sometime late in September 2019 someone who had been infected in New Zealand arrived in Samoa and started an epidemic in Samoa. I have been told that at least the genetic markers of the Samoa variety of measles are the same as those found in New Zealand. The first case in Samoa seems to have been recognized on September 28.

It took a while for the government of Samoa to realize exactly what was happening, or at least to understand what was going to happen. The number of cases increased slowly through September, but was quite sizeable by the middle of October

By October 20, 169 cases had been reported. and one death. The government issued its first press release concerning the epidemic on October 16. Another report came out on November 4, by when there had been 513 cases reported and 3 deaths. On November 13, a national emergency was declared. At this point there had been 1608 cases reported, and the number was growing rapidly. Roughly half of the afflicted were children.

All this should not have been much of a surprise. Measles vaccinations reached more than 90% of children one year old in 2013, but had declined steadily since:


The vaccination rate plummeted a couple of years ago, when an extremely unfortunate maladministration of vaccine had resulted in two deaths. This led to a gross misperception of risk. In any case, it was apparent to many that Samoa in September of 2019 was a kind of time bomb ready to explode. That measles can be infectious without symptoms means that it would not have been practical to keep out all infectious travelers.

From November 22 through December 8 the Samoan government's Twitter feed reported measles cases by age groups:


In this graph, the high vaccination rates of earlier years is perhaps evident in the low incidence for ages $10$ and higher, although it might not be so clear why the age group $10 – 19$ escaped so well. The unfortunate decline in vaccination rates is equally evident for recent years. Measles is a very dangerous disease for the very young, and there were many fatalities:


The government Twitter feed stopped posting age statistics on December 8, but continued to report the accumulated number of cases. In addition, David Wu supplied me with data prior to November 22. The plot below, repeated from the beginning of the column, summarizes all I know about the record of measles cases. Keep in mind that it is likely that not all cases were reported.


A well-publicized mass vaccination campaign began on November 20, and even though it takes time for a vaccination to take effect, this has clearly affected the development of the epidemic. Also, much international aid arrived. By now (I am writing on December 29) the epidemic is almost over. There have been (as of today) a total of 81 deaths associated to it. This is a shocking number, but not unusually high for fatality statistics. Measles is a dangerous disease! I find it curious that when I was young it wasn't widely known to be so, and certainly not to me. In those days there was no vaccine, and everybody just assumed without much anxiety that a case of measles was an inevitable feature of normal life.


The Samoa epidemic is distinguished by its size. One consequence is that random events have been smoothed out. In contrast is an epidemic of measles that occurred in an American boarding school in 1934. It was apparently begun by a single case, and then spread rapidly. The progress of the epidemic is shown in the following graph:


(Redrawn from W. L. Aycock, `Immunity to poliomyelitis',
American Journal of Medical Science 204, 1942)

A very different picture. Approaches to epidemics that take randomness into account are discussed in Chapter 6 of the text by Vynnycky and White.

Reading further

AMS Feature Column banner

Bill Casselman’s Column Archive

Here are Bill Casselman’s older Feature Columns. Bill’s newest columns may be found here.