AMS Feature Column banner

An epidemic is a sequence of random events

If a contact is made, then whether or not infection is transferred is much like tossing a (loaded) coin. How can a simulation take all this uncertainty into account?

Bill Casselman
University of British Columbia

Just recently, I started thinking about making my own epidemics. On a computer, of course, with a digital population.

What I had in mind was not very sophisticated. I would assemble a population of `people’, a few of which are infected with the virus, and then progress from one day to the next, letting the virus spread around through interactions in the population. At each moment, a person would be in one of several possible states:

  • S Uninfected but susceptible to infection
  • E Infected, but not yet infectious (exposed)
  • A Infectious, but not yet showing symptoms (asymptomatic)
  • Infectious and showing symptoms
  • Recovered, or otherwise incapable of becoming infected (say, through vaccination)

This is not quite a complete representation of reality. For example, in the current epidemic a very small number of people get reinfected. But it is not too far from being correct.

In general, as the simulation goes on, a person would progress from one state in this list to the next, except of course that being vaccinated is a shortcut from the first to the last state. Infections take place because susceptible persons interact with contagious ones. Even when an interaction takes place, whether or not infection is transmitted is a function of many accidental circumstances (for example, surrounding ventilation) as well as how contagious the infected person is.

There is some further internal detail to some of these states. The degree to which a person is infectious changes in time, usually rising to a peak after a few days, and then decreasing to zero. Hence in a simulation each person has attached to him in addition to (i) a designation of state but also in states A and I (ii) a number measuring infectiousness. A further datum is (iii) the frequency of contacts, especially close contacts, a person has with others. This can change with time. For example, when a person starts showing symptoms, he will presumably reduce the frequency of his contacts.

Where’s the mathematics? An epidemic is driven by random events. The moment at which a person moves from one state to the next is not fixed by circumstances, but is instead a matter of probabilities. The severity of a person’s infection is a matter of chance, as is the length of time from when he is infected to when he becomes infectious. Even if we know the average rate at which an infectious person makes contacts, the exact number of contacts made in one day is also a matter of chance. If a contact is made, then whether or not infection is transferred is much like tossing a (loaded) coin. How can a simulation take all this uncertainty into account?

Generating contacts

Take the matter of contacts. The most important parameter governing contacts is the average number $c$ of contacts made by a person in one day, but that does not mean that the number of contacts in one day is constant. It might well vary from day to day. Instead, it is reasonable to assume that personal interaction is a Poisson process, which means that the probability of making $k$ contacts during one day is $p_{k} = c^{k} e^{-c} / k!$. Note that the infinite sum of the $p_{k}$ is $1$, because of the well known formula

$$ e^{c} = 1 + c + {c^{2} \over 2!} + { c^{3} \over 3! } + \cdots \, . $$

For example, here are the graphs of some examples with a couple of values of $c$:

In a simulation, one will be dealing with a large number of people. Each of them will have his own regimen of interactions. Some of them will be more interactive than others. Thus, we are likely to find ourselves simulating a large number of independent Poisson processes, each one a sequence of random events. How to do this? In a program, this will involve a call to a routine, call it p_random(c) that returns on each call a random non-negative integer whose distribution matches the Poisson process with mean $c$.

Almost every programming language has built into it a routine random() that does something like this. On each call it returns a real number uniformly distributed in the open interval $[0,1)$. (David Austin’s previous FC gives some idea of how this works.) What we would like to do is use that routine to generate non-negative integers following a specified Poisson distribution. To give you some idea of how things go, we can see how this technique can generate integers uniformly distributed in any integral range $[0,n-1]$: get a random number $x$ in $[0,1)$ and then replace it by $\lfloor nx \rfloor$, the integral part of $nx$. If $n=2$ this offers a simulation of coin tossing, and if $n=6$ a simulation of throwing a die.

There is a reasonably well known procedure that does what we want, and very generally. This is explained in Knuth’s classic text. Suppose we are given an arbitrary probability distribution of integers with given probabilities $p_{k}$ for $k \ge 0$. That is to say, we are looking at some repeated event somewhat like coin tossing, in which a non-negative integer $k$ occurs with probability $p_{k}$. How can a program generate integers distributed according to these statistics?

Let $P(k)$ be the cumulative distribution

$$ P(k) = {\sum}_{i = 0}^{k} p(i) $$

Thus $P(k)$ is the probability that the integer $j$ occurring is $\le k$. The original distribution has the property that each $p_{i} \ge 0$ and ${\sum} p(i) = 1$, so $P(k)$ increases from $0$ to $1$. For example, if $c = 2.5$ and $p(k) = e^{-c} c^{k} / k!$ then the graph of $P$ looks like the figure below. Given a random number $t$ in $[0,1)$ we can determine an integer according to the recipe indicated—draw a line to the right from the point $(0,t)$ and select the $x$-coordinate of the point at which it hits this graph.

There is another suggestive way to see this. Make up a rectangle of total height $1$, partitioned into boxes, with the one labeled $k$ of height $p_{k}$. Given the number $x$, mark a point at height $x$ in the rectangle. Select the label of the box that contains it.

In the long run the number of times you hit the box labeled $k$ will be proportional to its area, hence to $p_{k}$. But how do you tell what that label is? There is one straightforward answer to this question:

def p_random():
	x = random()
	# this is the built-in random number generator
	s = 0
	i = 0
	while s <= x:
		i += 1
		s += p[i]
	# at exit p[0] + ... + p[i-1] <= x < p[i]
	return i-1

But this is somewhat inefficient, since each call will on average involve $n/2$ steps. Does there exist an algorithm that requires a number of steps independent of $n$? The answer is yes. A clever method whose basic idea is apparently due to Alastair Walker does this, at the small cost of building some preliminary structures.

Walker’s trick

As far as I know, Walker never explained how he discovered his method, but an elegant interpretation has been offered by Keith Schwartz. The basic idea is what we have already seen:

  1. Start with a box of some kind. Partition it into smaller labeled boxes in such a way that the area of box $k$ is proportional to $p_{k}$.
  2. To generate integers with a given probability distribution, choose points at random inside the box, and return the label of the region hit.
  3. Arrange a way to assign to every random $x$ in $[0,1)$ a point of the box.

The problem is to figure out how to make the partition in such a way that figuring out the label from the geometry of the partition can be done efficiently.

I’ll explain how Walker’s method works for a few simple cases, but first I’ll generalize the problem so that we are not restricted to the Poisson distribution. Suppose very generally that we are given probabilities $p_{i}$ for $i$ in $[0, n-1]$. We now want a method to generate random integers that follow the distribution assigned by $p_{i}$. That is to say, if we generate in this way a large number of integers, we want the proportion of occurrences of $i$ to be roughly $p_{i}$.

The case $n=2$ is like tossing a biased coin, and there is a simple solution. In this case, we are given two probabilities $p_{0}$, $p_{1}$ with $p_{0} + p_{1} = 1$. Partition the unit square in this fashion:

Choose a point $(x, y)$ randomly in the square. In fact, we do not have to pay any attention to $x$. If $y \le p_{0}$ we return $i = 0$ and otherwise we return $i = 1$.

But now, following Keith Schwartz and intending to show how Walker’s algorithm works in this very simple case, I will set up a rectangular region a bit differently. First of all, make its dimensions $2 \times 1$. Partition it twice: once into halves, each half a unit square …

… and then build in each half, say in the $i$-th half, a box of dimensions $1 \times p_{i}$. Label these boxes. Unless $p_{0} = p_{1}$, one of these will overflow at the top:

So then we cut off the overflow and paste it (with label) into the other box:

This shows the case $p_{0} \le p_{1}$. If $p_{1} < p_{0}$ things look like this:

How do we use these diagrams to generate the random integers we want? Choosing a random uniform number $x$ in $[0,1)$ amounts as before to choosing a point in the rectangle. But we do this differently, and we interpret it differently. Given $x$, set $X = 2x$. Let $m$ be the integer part of $X$, which will be either $0$ or $1$: $m = \lfloor X \rfloor$. Let $y = X – m$, the fractional part of $X$. Associate to $x$ a point in the $m$-th box with height $y$. If $y \lt p_{m}$, then we are in the box labeled by $m$, otherwise in the other one. In either case, the process will select that label $m$.

Now look at the case $n = 3$, and suppose that we are given probabilities $p_{0}, p_{1}, p_{2}$ with $\sum p_{i} = 1$. We start off with a rectangle of size $3 \times 1$, partitioned into $1 \times 1$ boxes:

There are again different cases, depending on the relative sizes of the $p_{i}$. The easiest case is that in which two of the values of $p$, say $p_{0}$ and $p_{1}$, are less than $1/3$, which implies that the third is greater than $1/3$. Draw the $i$ boxes of dimension $1 \times p_{i}$ in the $i$-th square, like this:

Now cut off two pieces from the large box and paste them into the smaller one, getting:

I’ll explain in a moment how to use this to generate random numbers.

There is a second case, in which two of the $p_{i}$ are larger than $1/3$:

Here, we want to cut off from the tops and fill in the third. It is tempting to cut off exactly the overflows in the large regions and paste them in, but this would give the third region three labels. which is not what we want. So we fill in from just one of the large regions. This will leave some space in it.

We fill in the new empty space from the other large region. We are now finished:

How to use what we have constructed? In each case, we have partitioned the rectangle of size $3 \times 1$. First, into three unit squares, and then each of these in turn into one or two labeled rectangles. Given a random $x$ in $[0,1)$, we want to come up with some integer in $[0,3)$. How? We first scale it to get $X = 3x$. This will lie in the interval $[m, m+1)$ for $m = \lfloor X \rfloor$. We now turn attention to the $m$-th unit square. The integer we return will be one of the labels found in that square. Let $y = X – m$, the fractional part of $X$, which will be in $[0,1)$. If $y \lt p_{m}$ (the height of the bottom rectangle), p_random returns $m$, otherwise the alternate label in that square.

In effect we are assigning principal and alternate labels to the boxes. Except that there won’t be an alternate label if the box is full.

In the literature, the array I call `alternate’ is called alias, and the method described here is called the alias method.

The full algorithm

This method generalizes nicely. The original version seems to be due to Alastair Walker. It became well known because Knuth called attention to it (although mostly in exercises). Michael Vose then came up with a more efficient version, and made it handle rounding errors more stably.

I quote below, almost verbatim, the algorithm found originally in Vose’s paper. It improves the running time of Walker’s program, and corrects its handling of rounding errors. It has two parts. One is an initialization that sets up arrays prob and alias from the probability array $p$. These are used in the function rand, which returns a random variable in the range $[0,n-1]$, whose probabilities are specified in p of length n. The call to random returns a variable uniformly distributed in $[0, 1)$.

There are several loops in the program.The first assigns integers in the range $[0,n-1]$ to one of two explicit arrays large and small. Those in small are the $i$ such that $p_{i} \le 1/n$. As the program proceeds, the integers in these arrays are those whose boxes have not yet been filled. Implicit is a third subset of $[0,n-1]$, which I’ll call finished. This contains all those indices for which no further processing is necessary—i. e. whose box is filled.

In the loop [0] the two arrays small and large are initialized, and the subset finished is left empty. In every run through this loop, an index is taken from small, its square is filled, and it is added to FIN. This happens by removing filling material form one of the boxes in large, which therefore becomes smaller. It is added to either small or large, according to how much is left. In each of these loops, the total size of large and small is decremented.

def init(p):
    l = 0
    s = 0
    [0] for i in range(n):
        if p[i] > 1/n:
            large[l] = i
            l += 1
            small[s] = i
            s += 1
    [1] while s > 0 and l > 0:
        s -= 1
        j = small[s]
        l -= 1
        k = large[l]
        prob[j] = n*p[j]
        alias[j] = k
        p[k] += (p[j]-b)
        if p[k] > b:
            large[l] = k
            l += 1
            small[s] = k
            s += 1
    [2] while s > 0:
        s -= 1
        prob[small[s]] = 1
    [3] while l > 0:
        l -= 1
        prob[large[l]] = 1

def p_random():
    x = n*random(0, 1)
    m = floor(x)
    if (x - m) < prob[m]: return m
    else: return alias[m]

The last loops [2] and [3] of Vose’s code are necessary to deal with rounding errors, which I include without further comment.

Here is a typical run for a Poisson process with mean $\mu = 4$.

The simulation

Let’s see how to use Walker’s method to simulate how a person just infected goes on to infect others. Suppose that he starts to be infectious on the fifth day, and that the probability that he infects a contact is specified in the following table:

$$ \matrix { i = \hbox{day after infection} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & \ge 10 \cr r_{i} = \hbox{probability of infection} & 0 & 0 & 0 & 0 & 0.1 & 0.3 & 0.4 & 0.4 & 0.2 & 0 \cr } $$

Suppose also that he makes and average of $4$ close contacts per day, and that these follow a Poisson distribution. Applying Walker’s algorithm, we get a sample run of contacts like this:

$$ \matrix { i = \hbox{day after infection} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\ \dots \cr c_{i} = \hbox{number of contacts} & 5 & 2 & 3 & 4 & 3 & 3 & 2 & 1 & 3 & 3\ \dots \cr } $$

In this run, how many people does he infect? There is no unique answer to this question. It depends on something like a coin toss at each contact. What we can calculate is an average. On the fifth day he infects an average of $0.1 \cdot 3$ people, on the sixth … etc. All in all:

$$ \hbox{average number of infections} = {\sum} c_{i} r_{i} = 0.1 \cdot 3 + 0.3 \cdot 3 + 0.4 \cdot 2 + 0.4 \cdot 1 + 0.2 \cdot 3 = 3.0 \, . $$

This is much lower than the average number of people he infects, which is called the $R_{0}$ value for this example. Here it is $(0.1 + 0.3 + 0.4 + 0.4 + 0.2) \cdot 4 = 5.6$.

Reading further

AMS Feature Column banner

More Measles

In this column, I shall say more about modeling the progress of an epidemic. …

Bill Casselman
University of British Columbia, Vancouver, Canada


This is the second Feature Column in which I discuss the mathematics involved in the transmission of measles. The first one made a few very brief remarks about how a mathematical model might track the progress of an epidemic, and then discussed the phenomenon of herd immunity, which enables a population as a whole to become immune to the disease even though not all individuals are immune. In this column, I shall say more about the first topic–i. e. say more about modeling the progress of an epidemic.

As in the earlier column, my intention here is not to come even close to a detailed analysis, but just to explain basic phenomena.

What's to be explained?

In late September 2019 a single case of measles was reported on one of the islands of the South Pacific nation of Samoa. It was followed by several more, the number of cases building slowly at first but then eventually growing rapidly:



( Most data from the Twitter feed of the government of Samoa, and some transmitted by David Wu. )

What possible theory can account for such phenomena? For example, could one have predicted the eventual size of the epidemic from early data? Could one have predicted its timing?

The answer to the first question is almost certainly "no". This epidemic seems to be somewhat unusual. When one infectious person appears in a population that is largely vulnerable to a disease he will infect a certain number of individuals that he comes in contact with. Each of them will infect roughly the same number, and so on. So at the beginning of an epidemic one expects exponential growth. That is not evident here. (Why not?)

Infected people will eventually recover from a disease like measles, and they cannot subsequently be reinfected. As time goes on, the number of immune people will grow, and infected people will come into contact with fewer and fewer susceptible ones. The epidemic will slow down. If no preventive measures are taken, such as patient isolation or vaccination, nearly all the population will have been eventually infected. In the Samoa epidemic, a mandatory vaccination program was begun in mid-November, and after the new vaccinations took effect (a period of roughly 10 days), the graph begins visibly to flatten.

The parameters of a disease

There is a simple and very basic model for such epidemics. It divides a population into a small number of subsets, and keeps track of the sizes of these subsets from one day to the next, starting with an initial partition. The process depends on a very small number of parameters: (1) the basic reproduction number ${\bf R}_{0}$–the number of people infected by a single case appearing in an unprotected population; (2) the length of time $P$ from when a person is infected to when he in turn becomes infectious (the pre-infectious period); (2) the length of time $D$ that he is infectious.

For measles, ${\bf R}_{0}$ is about $13$, $P$ is about $8$, and $D$ is also about $8$.

It is very important to realize that these data are not exact, but generally vary over a small range according to circumstances. Underlying them is some kind of probability distribution, and they should definitely be interpreted as rough, even poorly known, averages. For example, in this model the degree of contagion remains constant for a period of $D$ days, while in reality a person may well be more contagious at certain times than others.

Of these parameters, it is ${\bf R}_{0}$ that is at once most subtle and most significant. The following table displays approximate lower and upper limits for the value of ${\bf R}_{0}$ associated to several diseases. Measles is very contagious.


The basic SEIR model

There is a standard basic model of epidemics of general diseases similar to measles, in which a case of the disease confers immunity (unlike, say, malaria or HIV). It is unrealistically simple, but nonetheless suggestive. Its principal value is probably that if a real epidemic differs from this model one will want to understand why.

In this model the population under consideration is partitioned into five states:

  • S: susceptible to infection
  • E: exposed (and infected) but not yet infectious
  • I: $\kern 1pt$ infectious
  • R: recovered
  • V: those who have been vaccinated or had the disease previous to the current epidemic

Those in categories $R$ and $V$ are immune. Those who have been vaccinated are effectively in group $R$, through a kind of virtual infection. One can assume $V$ is contained in $R$ if people are not vaccinated in the epidemic, by specifying $R(0)$ to be $V(0)$. In addition, it is valuable to keep track of the time elapsed (say, in days) since a person's last state transition took place. But I'll not do that here.

For some diseases, and in particular measles, there is a possible further category:

  • A: asymptotic (not yet showing symptoms) but infectious

For measles, the period in which this happens last for several days, and it makes measles especially dangerous. But although the distinction between $I$ and $A$ is important in practice, since it is only patients with symptoms who are quarantined, I'll ignore this distinction in what is to come.

I'll simplify things quite a bit, and keep track of the state of an epidemic in five numbers–the sizes of each of the relevant categories listed above. But already in this characterization the lack of reality will be apparent, because these numbers will be floating point approximations to integer values. And we'll keep track of states at fixed intervals of time $n \, dt$. In fact, I'll assume that $dt = 1$ day. So we are tracking numbers $$ S(t), \quad E(t), \quad I(t), \quad R(t), \quad V(t) $$ at times $t = 0$, $1$, $2$, … days after an initial case.

How does the state at time $n+1$ change from that at time $n$?


It turns out that this change of state will be fairly simple, so that tracking a model epidemic will require just specifying an initial state. For example, a single infectious person appearing in a susceptible population of size $N$ will have $$ S(0) = N – 1, \quad E(0) = 0, \quad I(0) = 1, \quad R(0) = 0, \quad V(0) = 0 \, . $$

In all examples I'll look at, I'll take $V(t)$ to be a constant $V_{0}$.

More about parameters

The principal parameter of an epidemic is its basic reproduction number ${\bf R}_{0}$. (This is conventional notation. It is not my fault that there is no relation between the $R$ of ${\bf R}_{0}$ and the $R$ of $R(t)$.) This is defined to be the average number of people directly infected by one person appearing in a large population in which everyone is susceptible. This is necessarily a somewhat theoretical notion, since in the modern world it is hard to find such populations, but it is nonetheless an important concept. The number ${\bf R}_{0}$ doesn't have to be an integer, since it is just an average. It is dimensionless, but it can be expressed in an illuminating fashion as a product of ratios that do have dimensions. Explicitly $$ {\bf R}_{0} = (\hbox{number of infections per contact}) \cdot (\hbox{number of contacts per unit of time}) \cdot (\hbox{amount of infectious time per infection}) \, . $$

It is useful to have this factorization, since it helps you keep track of assumptions going into our model, and with luck might suggest how to measure ${\bf R}_{0}$ when a new disease appears. For example, it should be clear that it depends in social structure, since the rate of contacts varies. In particular, one expects ${\bf R}_{0}$ to be higher than average in an epidemic spreading inside a school, with a lot of contact. In one study the value of ${\bf R}_{0}$ in such a case was estimated to be around $30$.

There is a variant of ${\bf R}_{0}$ that is useful to be aware of. An infectious person is capable of infecting ${\bf R}_{0}$ others in a totally susceptible population, but as an epidemic develops, more and more people become immune to the disease, and the number of susceptibles decreases. In this situation, if an infectious person has ${\bf R}_{0}$ potentially infecting contacts, only a fraction of these can in fact become infected. The fraction is $S(t)/N$. Therefore he infects in one day $({{\bf R}_{0}/ D })\cdot ({ S(t) / N })$, if $N$ is the population size. The effective or net infection number is therefore $$ {\bf R}_{\rm net} = {\bf R}_{0} \cdot { S(t) \over N } \, . $$

Change of state

(S) If there are $I(t)$ infectious individuals then the number of new cases in one day is $$ {{\bf R}_{\rm net} \over D } \cdot I(t) = {{\bf R}_{0} \over D }\cdot { S(t) \over N } \cdot I(t) \, . $$ Hence $$ S(t + 1) = S(t) – {{\bf R}_{0} \over D } \cdot { S(t) \over N } \cdot I(t) = S(t) – {{\bf R}_{\rm net} \over D } \cdot I(t) \, . $$ If $$ \lambda(t) = {{\bf R}_{0} \over D } \cdot { I(t) \over N } $$ then this becomes $$ S(t + 1) = S(t) – \lambda(t) S(t) \, . $$

(E) The number of people who are infected but not contagious at time $t$ is increased by the susceptibles who become infected, and is decreased by those who transition to an infectious state. Let $F$ be the rate of transition. If $P$ is as above the number of pre-infectious days, then $F = 1/P$. We have $$ E(t + 1) = E(t) + \lambda(t) S(t) – F E(t) \, . $$

(I) The number of people who are contagious is increased by those who transition from the previous state, and decreased by those who recover. Let $\Omega$ be the rate of recovery. If $D$ is as above the number of infectious days, then $\Omega = 1/D$ and $$ I(t + 1) = I(t) + F E(t) – \Omega I(t) \, . $$

(R) Finally, on the assumption that the size of the population remains constant: $$ R(t + 1) = R(t) + \Omega I(t) \, . $$


With $$ \lambda(t) = {{\bf R}_{0} \over D } \cdot { I(t) \over N } $$ We have $$ \eqalign { S_{t + 1} &= S(t) – \lambda(t) S(t) \cr E_{t + 1} &= E(t) + \lambda(t) S(t) – F E(t) \cr I_{t + 1} &= I(t) + F E(t) – \Omega I(t) \cr R_{t+1} &= R(t) + \Omega I(t) \, . \cr } $$ It is curious, and of some purely mathematical interest, that if I define $$ \eqalign { s(t) = { S(t) \over N } \cr e(t) = { E(t) \over N } \cr i(t) = { I(t) \over N } \cr r(t) = { R(t) \over N } \cr } $$ these equations become ones in which $N$ does not occur.

The presence of the factor $I(t)$ in $\lambda(t)$ makes this what a mathematician calls a non-linear system of equations–the various terms do not scale in a linear fashion. In any case, if one starts with known conditions, one can compute approximate values of all these variables for as many days as one wants. In fact, it is very easy to set up a spreadsheet to do this once ${\bf R}_{0}$, $P$, and $D$ are known. To simulate an epidemic, choose some initial values for $S(0)$, $I(0)$, $E(0)$, and $R(0)$ as well as $V_{0}$ and then compute in steps the values of the variables for all variables.


The first thing one probably wants to do with the technique outlined above is to run a few examples in order to get some feel for what happens.

The following figure illustrates the progress over 100 days of a simulated measles epidemic (${\bf R}_{0} = 13$, $P = 8$, $D = 7$) in in a totally susceptible population ($A(t) \equiv 0$) of 1000 people, starting out from the introduction of one case. Qualitatively, it looks very reasonable. Note the lag between infection and infectiousness.



The next figure illustrates what happens when 50% of the the initial population is immune at the start. It looks like in the very long run, everybody will be either initially immune or will have had measles, but that is not the case–instead the number of susceptibles has a non-vanishing limit that will be extremely small if ${\bf R}_{0}$ is large. But for ${\bf R}_{0} = 2$, this part of the population will be about 20%.



In the next example, I take 95% of the population to be initially immune. In this case, the initial immunity is high enough to lead to herd immunity–that is to say, a definite percentage of the population is still susceptible. The initial infection dies out very quickly. Those who are susceptible at the beginning, for example those who cannot be vaccinated, remain susceptible without damage. This is true of young children, for whom the recommended age for vaccination, for various reasons, is about 15 months. Vaccination thus becomes a civic duty, not just a wise personal choice.


The Samoa measles epidemic of 2019

Early in 2019, as explained in a BBC news item, an epidemic of measles broke out in New Zealand. There is a fair amount of traffic between New Zealand and the small island nation of Samoa (population 196,000), partly because of the large number of Samoans working in New Zealand, and it is likely that sometime late in September 2019 someone who had been infected in New Zealand arrived in Samoa and started an epidemic in Samoa. I have been told that at least the genetic markers of the Samoa variety of measles are the same as those found in New Zealand. The first case in Samoa seems to have been recognized on September 28.

It took a while for the government of Samoa to realize exactly what was happening, or at least to understand what was going to happen. The number of cases increased slowly through September, but was quite sizeable by the middle of October

By October 20, 169 cases had been reported. and one death. The government issued its first press release concerning the epidemic on October 16. Another report came out on November 4, by when there had been 513 cases reported and 3 deaths. On November 13, a national emergency was declared. At this point there had been 1608 cases reported, and the number was growing rapidly. Roughly half of the afflicted were children.

All this should not have been much of a surprise. Measles vaccinations reached more than 90% of children one year old in 2013, but had declined steadily since:


The vaccination rate plummeted a couple of years ago, when an extremely unfortunate maladministration of vaccine had resulted in two deaths. This led to a gross misperception of risk. In any case, it was apparent to many that Samoa in September of 2019 was a kind of time bomb ready to explode. That measles can be infectious without symptoms means that it would not have been practical to keep out all infectious travelers.

From November 22 through December 8 the Samoan government's Twitter feed reported measles cases by age groups:


In this graph, the high vaccination rates of earlier years is perhaps evident in the low incidence for ages $10$ and higher, although it might not be so clear why the age group $10 – 19$ escaped so well. The unfortunate decline in vaccination rates is equally evident for recent years. Measles is a very dangerous disease for the very young, and there were many fatalities:


The government Twitter feed stopped posting age statistics on December 8, but continued to report the accumulated number of cases. In addition, David Wu supplied me with data prior to November 22. The plot below, repeated from the beginning of the column, summarizes all I know about the record of measles cases. Keep in mind that it is likely that not all cases were reported.


A well-publicized mass vaccination campaign began on November 20, and even though it takes time for a vaccination to take effect, this has clearly affected the development of the epidemic. Also, much international aid arrived. By now (I am writing on December 29) the epidemic is almost over. There have been (as of today) a total of 81 deaths associated to it. This is a shocking number, but not unusually high for fatality statistics. Measles is a dangerous disease! I find it curious that when I was young it wasn't widely known to be so, and certainly not to me. In those days there was no vaccine, and everybody just assumed without much anxiety that a case of measles was an inevitable feature of normal life.


The Samoa epidemic is distinguished by its size. One consequence is that random events have been smoothed out. In contrast is an epidemic of measles that occurred in an American boarding school in 1934. It was apparently begun by a single case, and then spread rapidly. The progress of the epidemic is shown in the following graph:


(Redrawn from W. L. Aycock, `Immunity to poliomyelitis',
American Journal of Medical Science 204, 1942)

A very different picture. Approaches to epidemics that take randomness into account are discussed in Chapter 6 of the text by Vynnycky and White.

Reading further