Alan Turing and the Countability of Computable Numbers

Turing’s methodology was unique: he imagined hypothetical machines that could perform complicated mathematical tasks in a deterministic manner, in the way computers do today. In this way, he inadvertently kickstarted the entire field of modern computer science…

Adam A. Smith
University of Puget Sound

Alan Turing may be the most-influential-yet-least-read figure in 20th Century mathematics. His landmark paper, “On computable numbers, with an application to the Entscheidungsproblem”, was published in the Proceedings of the London Mathematical Society in late 1936. This is the paper that spun computer science off of mathematics, into the distinct field that we know today. Yet, if you ask mathematicians and computer scientists alike if they have read this paper, the answer is frequently “No.” It is so long and amazingly dense that even experts often have a very hard time parsing his arguments. This column aims to rectify this slightly, by explaining one small part of Turing’s paper: the set of computable numbers, and its place within the real numbers.

Statue of Alan Turing

Statue of Alan Turing by Stephen Kettle. Photo by Antoine Taveneaux (CC BY-SA 3.0).

The Entscheidungsproblem (German for “decision problem”) is the problem of developing an algorithm that can determine whether a statement of first-order logic is universally valid. However, Turing did not know that American mathematician Alonzo Church had already proven that such an algorithm is impossible a few months previously. Even so, Turing’s methodology was unique: he imagined hypothetical machines that could perform complicated mathematical tasks in a deterministic manner, in the way computers do today. In this way, he inadvertently kickstarted the entire field of modern computer science (which has made some modest improvements since 1936). As part of his proof, he developed the concept of computable numbers. Turing used his imaginary machines (that have since come to be called “Turing machines” in his honor) for the process of calculating numbers in this set. It contains any number that can be calculated to within an arbitrary precision in a finite amount of time. This set includes all rational and algebraic numbers (e.g. \(\sqrt{2}\)), as well as many transcendental numbers such as \(\pi\) and \(e\). Interestingly, Turing created a very natural extension to Georg Cantor’s set theory, when he proved that the set of computable numbers is countably infinite!

Most mathematicians are familiar with the idea of countability. That is, the notion developed by Cantor in the 1870s that not all infinite sets have the same cardinality. A set that is countably infinite is one for which there exists some one-to-one correspondence between each of its elements and the set of natural numbers \(\mathbb{N}\). For example, the set of integers \(\mathbb{Z}\) (“Z” for “Zahlen”, meaning “numbers” in German) can be easily shown to be countably infinite. Cantor himself developed a well-known proof that the set of rational numbers \(\mathbb{Q}\) (for “quotient”) is also countably infinite. Thus, in a very real sense \(\mathbb{N}\), \(\mathbb{Z}\), and \(\mathbb{Q}\) all have the same cardinality, even though \(\mathbb{N} \subset \mathbb{Z} \subset \mathbb{Q}\).

Conversely, an infinite set for which there is no one-to-one correspondence with \(\mathbb{N}\) is said to be “uncountably infinite”, or just “uncountable”. \(\mathbb{R}\), the set of real numbers, is one such set. Cantor’s “diagonalization proof” showed that no infinite enumeration of real numbers could possibly contain them all. Of course, there are many irrational real numbers that are useful in common computations (\(\pi\), \(e\), \(\phi\), \(\sqrt{2}\), and \(\log~2\), just to name a few). A common misconception is that a set of all such useful numbers (including infinitely many transcendental numbers) is somehow “too complicated” to be merely countably infinite. Turing’s paper proved this intuition to be incorrect.

The rest of this column is laid out as follows. First, we repeat Cantor’s proofs showing that \(\mathbb{Z}\) and \(\mathbb{Q}\) are countable and \(\mathbb{R}\) is uncountable. Then we will show how Turing extended Cantor’s work, by proving the countability of the set of computable numbers. We will call this set \(\mathbb{K}\), to better fit in with the other sets of numbers. However, we will reprove Turing’s ideas using Python rather than his original Turing machines. The ideas behind the proofs will remain unchanged, while making it much more easily understood to a modern audience.

The sets of integers and rationals are both countably infinite

Intuitively, one might think of the set of integers \(\mathbb{Z}\) as being “bigger” than the set of the natural numbers \(\mathbb{N}\). After all, \(\mathbb{Z}\) is a proper superset of \(\mathbb{N}\)! However, by rearranging the integers to start with \(0\) and count up in magnitude alternating between positive and negative, we can create an infinite list of all the integers, with a definite starting point. This rearrangement allows a bijective function between \(\mathbb{N}\) and \(\mathbb{Z}\). This means the function gives a one-to-one and onto correspondence between the two sets, so the function is invertible.

Here’s a proof. Rearrange \(\mathbb{Z}\) to the order \(\{0, 1, -1, 2, -2, 3, -3, …\}\). Then map them to the elements of \(\mathbb{N}\) in their standard order, so that \(0\rightarrow 0\), \(1\rightarrow 1\), \(-1\rightarrow 2\), and so on. This is injective (“one-to-one”) , since no two different integers will ever map to the same natural number. And it is surjective (“onto”), because there exists an integer to map to every natural number. Therefore the relationship is a bijection, and thus the integers are a countable set.

Proving that the set of rational numbers is countable is more difficult, given that there are two “degrees of freedom” in a rational number: the numerator and the denominator. It seems difficult to rearrange \(\mathbb{Q}\) into a list the same way we did with \(\mathbb{Z}\). (That is, one with a definite starting point, that extends infinitely forward in a single dimension.) One cannot simply list all the rational numbers with numerator 1, then all with numerator 2, etc., because there are an infinite number of them in each subset. If we started out by listing all the rationals with numerator 1 (for example), we’d never get to the others!

Instead, Cantor thought to traverse the set in an alternating “zig-zag” fashion, as shown here. We start by only considering the positive rational numbers \(\mathbb{Q}^+\). We’ll extend to the complete set later. This zig-zag pattern lets us get to any given positive rational number, eventually.

Diagram with fractions with numerator 1, then 2, then 3, etc.

Ordering of the positive rational numbers \(\mathbb{Q}^+\), in order to create a correspondence with the natural numbers \(\mathbb{N}\). Start with \(\frac{1}{1}\), then proceed by diagonal rows, skipping any fraction that isn’t in lowest terms.

Let’s prove that the rational numbers are countable. We start by showing a bijection between the positive rationals \(\mathbb{Q}^+\) and \(\mathbb{N}\). Lay out the elements of \(\mathbb{Q}^+\) as shown in the figure above, so that the first row consists of all numbers with a \(1\) numerator in order of increasing denominator (\(\frac{1}{1}\), \(\frac{1}{2}\), \(\frac{1}{3}\), \(\ldots\)), the second row has all those with a \(2\) demoninator in the same order, and so on. Then, traverse them as shown in the figure, moving diagonally while weaving back and forth. We skip any fractions not in lowest terms, so that no rational number appears twice. In this way, we can arrange the set \(\mathbb{Q}^+\) into the order \(\{\frac{1}{1}, \frac{2}{1}, \frac{1}{2}, \frac{1}{3}, \frac{3}{1}, \ldots\}\). When mapping them to the natural numbers in this order, the mapping function is an injection because no two different elements of \(\mathbb{Q}^+\) map to the same member of \(\mathbb{N}\), and it is a surjection because there exists a member of \(\mathbb{Q}^+\) for every member of \(\mathbb{N}\). Thus it is a bijection, and \(\mathbb{Q}^+\) is countable.

To show that the set of all rational numbers \(\mathbb{Q}\) is countable, use the reordering strategy employed to prove that \(\mathbb{Z}\) is countable. Reorder \(\mathbb{Q}\) to start with \(0\), and then proceed through the rationals in the order shown in the figure, alternativing positive and negative. Thus \(\mathbb{Q}\) will be ordered \(\{\frac{0}{1}, \frac{1}{1}, -\frac{1}{1}, \frac{2}{1}, -\frac{2}{1}, \frac{1}{2}, -\frac{1}{2}, \frac{1}{3}, -\frac{1}{3}, \frac{3}{1}, -\frac{3}{1}, \ldots\}\). These elements can be mapped to the elements of \(\mathbb{N}\) in its usual order, and this is a bijection for the same reasons as given above.

The set of reals is uncountably infinite

However, real numbers are inherently uncountable. A rephrasing of Cantor’s original proof follows, using a trick that has come to be known as “diagonalization.” No matter what infinite list of real numbers is given, we can generate a new number \(x\) that cannot possibly be in that list.

To prove the real numbers are uncountable, let’s assume they are not. Then there would exist some one-to-one correspondence between the real numbers \(\mathbb{R}\) and the natural numbers \(\mathbb{N}\). If that were the case, then it would be possible to enumerate all the real numbers in some order, such as that illustrated in the table shown. However, we can construct a real number \(x\) that cannot possibly be in the list, which means that it’s not a complete list. We do this by taking one digit from each number in the list, and changing it. (In the table shown, the change is effected by adding 1 to the digit, but skipping 9 and 0 so as to avoid numbers such as \(0.0999\ldots\) and \(0.1000\ldots\) that are equal, even though their digits are different.) This way, \(x\) differs from every single number in the list in at least one of its digits. The real number \(x\) has no corresponding element in the natural numbers, and therefore the original assumption must be false. \(\mathbb{R}\) must be uncountable.

A table of real numbers with a digit to change highlighted
Generating a real number that cannot possibly be in a countably infinite list of other real numbers. Each digit is derived by taking the corresponding digit of one of the numbers in the list, and adding 1 (skipping 9 and 0).

Thus, integers and rational numbers are both countable sets, while real numbers are uncountable. Cantor also proved that the algebraic numbers (\(\mathbb{A}\)) are also a countable set. (These are the roots of polynomials with rational coefficients.) For his insight, Cantor was allegedly called an “apostate” (“Abtrünnigen” in German) and a “corrupter of youth” (“Verderber der Jugend”), among other colorful names. Many of his contemporaries felt that the nature of infinity was beyond human mathematics, being the realm of God. It was not until David Hilbert began to champion his ideas that they became more widely accepted in the early 1900s. Today, they are part of a standard undergraduate curriculum.

Defining computable numbers

We will now move on to Turing’s work on computable numbers, which he defined in his famous paper. We use \(\mathbb{K}\) to represent this set. (\(\mathbb{C}\) is taken, as it usually refers to the set of complex numbers.) Turing defined a computable number as one that can be calculated to within an arbitrary precision, within a finite amount of time. These days, the easiest way to do that is often with a computer program—though any algorithmic approach will do.

First, let us show that all rational numbers are computable. When represented as a decimal, a rational number either terminates, or eventually begins to infinitely repeat the same set of digits. Thus, given enough time, a program could be written to display any rational number to the desired precision.

The figure here shows a non-terminating Python program, similar in spirit to the machines Turing laid out in his paper. This one outputs the never-ending decimal representation of one third. It first prints out “0.”, and then loops forever printing the digit “3” until the user gets tired and pulls the plug. In this way, it “calculates” the number to as accurate a degree as the user wishes.


print("0.", end="")
while True: print("3", end="")
Simple Python program that generates the number one third.

It may come as a surprise that in his seminal paper, Turing preferred Turing machines that never terminated their calculations. Today, Turing machines are often associated with the “halting problem”, which is the impossible task of proving whether or not an arbitrary program will eventually terminate. However, Turing used his machines to calculate numbers to ever-increasing precisions. In fact, he referred to his machines as “unsatisfactory” if they ever stopped, and “satisfactory” if they kept on going forever. Although Turing’s paper touches on concepts similar to the halting problem, this would not be posed as we know it today until the 1950s.

The procedure to calculate a computable number does not need to be in code form (though any of the below approaches may be programmed, if needed). For example, \(\pi\) is also a computable number, and to show this we only need to express it as an infinite sum, like this one:

$$\pi = \frac{4}{1} – \frac{4}{3} + \frac{4}{5} – \frac{4}{7} + \frac{4}{9} – \ldots$$

This is not as “computery” as the above code, but it is algorithmic all the same. One may continue adding further terms until the approximation is as accurate as needed. There are other well-known infinite series to calculate other irrational numbers like \(e\) and \(\phi\). So long as there exists a mechanical way to approximate a number to ever-increasing precision, within finite time, the number is computable.

Writing an efficient Python program to calculate \(\pi\) is a little more difficult. The program shown in this figure uses a “streaming” method developed by J. Gibbons and implemented by D. Bau. It is not as easily comprehensible as the above infinite series, but is more in the spirit of Turing’s paper. It is able to continue printing further digits of \(\pi,\) only needing to maintain six variables between each print statement (rather than reviewing all the digits it has already output), making for a very efficient calculation.


def generate_pi_digits():
    q = 1
    r = 180
    t = 60
    i = 2
    while True:
        u,y = 3*(3*i+1)*(3*i+2), (q*(27*i-12) + 5*r) // (5*t)
        q,r,t,i = 10*q*i*(2*i-1), 10*u*(q*(5*i-2)+r - y*t), t*u, i+1
        yield y

pi_digits = generate_pi_digits()
print(str(next(pi_digits)) + ".", end="")
for d in pi_digits: print(d,end="")

Python program that generates the number \(\pi\).

Some other numbers such as \(\sqrt{2}\) and \(\log~2\) also have infinite sums, products, or fractions that can be used to calculate them. However, it’s often more efficient to compute them via successive approximations. For example, let us approximate \(\sqrt{2}\) with a number \(r\). We know that \(r\) is a positive number such that \(r^2 = 2\). Since \(1 < r^2 < 4\), that means that \(1 < r < 2\). We can then successively approximate \(r\) as \(1.5\), \(1.25\), \(1.375\), \(1.4375\), \(1.40625\), and so on, each time adding or subtracting the next smaller power of two, depending on whether the current approximation’s square is greater than or less than 2. We stop when the power added or subtracted is less than half the desired maximum error.

There is one flaw with calculating irrational numbers like these on a modern computer: any computer has only a finite amount of memory, and eventually that memory will be exhausted. This is one key difference between Turing machines and real computers: Turing assumed that his machines had inexhaustible memory. Thus, it is fair to do likewise when proving that the set of computable numbers is countable. However, one could also slightly redefine a computable number to be one that can be calculated to within an arbitrary precision in a finite amount of time and with a finite amount of resources. This is equivalent to Turing’s original definition, since if a calculation required unbounded memory, it would need unbounded time to access that memory.

In general, any number with a well-defined value, or well-defined method to calculate its value, may be said to be an element of \(\mathbb{K}\). If you can write a program to calculate a number (to within some arbitrary precision, in a finite amount of time), it’s computable. This includes many well-known non-algebraic real numbers. In fact, it’s very difficult to come up with a number that’s not computable.

Gödel numbers

Now that we’ve defined our set \(\mathbb{K}\), we need to show that it is countable. The proof relies on the concept of a Gödel number. Gödel numbers were developed by Kurt Gödel himself, in order to prove his “incompleteness theorems.” (If you are new to German, Gödel’s name is pronounced similarly to the English word “girdle”, but without the “r” sound.)

Definition: Gödel Number
A Gödel number is a natural number that is a unique representation of a particular member of a set \(S\). The Gödel number is calculated with a function \(\mathrm{g}\) whose domain is \(S\) and whose codomain is \(\mathbb{N}\).

Note that \(\mathrm{g}\) is an injective function, so that if \(\mathrm{g}(s) = \mathrm{g}(s’)\), it must be the case that \(s = s’\). However some integers may not be valid Gödel numbers for members of \(S\), so the function \(\mathrm{g}\) is frequently not a bijection between \(S\) and \(\mathbb{N}\).

If a set \(S\) is infinite, and there exists a function \(\mathrm{g}\) to calculate a valid Gödel number for every member \(s\), then \(S\) must be countably infinite. To prove this, we must show that \(S\) has a one-to-one correspondence with \(\mathbb{N}\). Since the Gödel number of each set element \(s\) is unique, one of them must have the least value. Let this element of \(S\) correspond to \(0\). Then, the one with the second-least Gödel number will correspond to \(1\), the next one to \(2\), and so on. Since both sets are infinite, the correspondence can continue infinitely. Hence there exists a bijection between \(S\) and \(\mathbb{N}\), and therefore \(S\) is countable.

Bringing it together

It remains to show that a Gödel number function \(\mathrm{g}\) exists for \(\mathbb{K}\). The trick is to calculate a Gödel number from the program that generates a computable number, rather than from the number itself. Recall that a program consists of a finite block of text. Most computers use the ASCII code or Unicode to represent a text file, in which every valid character is a encoded as a number between 1 and 127. (A 0 would represent a null character, which isn’t used in text files.) Thus, the source code itself may be represented as a base-128 number with no 0s, and as many digits as there are characters in the file. We will simply translate this base-128 number into a common base-10 one. For example, in the case of the tiny program above that prints one third, the computation is:
$$112\cdot128^0 + 114\cdot128^1 + 105\cdot128^2 + \ldots + 10\cdot128^{50}$$
Here, the \(112\), \(114\), and \(105\) represent the characters p, r, and i respectively: the first three characters in the program. The fifty-first character at the end is a newline, which is represented by \(10\). The resulting sum is the number:

23, 674, 419, 604, 088, 055, 738, 162, 829, 936, 727, 249, 274, 729, 959, 756, 590, 023, 485, 007, 031, 946, 824, 166, 916, 359, 320, 099, 837, 891, 922, 699, 574, 436, 329, 840.

The other program, to calculate \(\pi\), results in a Gödel number that is approximately \(1.77\times10^{654}\). These numbers are unwieldy, but remember that they encode entire programs. Because one can recover the complete program from its number, it must be the case that the function \(\mathrm{g}\) is injective.

This is similar to Turing’s approach: he created Turing machines to compute numbers, and then represented each machine as a large table of states. This let him systematically create a unique “description number” for each possible machine. (The term “Gödel number” was not yet in common use.) The number was a large positive integer that described every aspect of the machine, and no two different machines could ever have the same number. Thus, as in our case, Turing showed that there exists a valid Gödel function \(\mathrm{g}\), and used this to show that \(\mathbb{K}\) is countable.

It may be argued that any given computable number has an infinite number of programs that can calculate it, and hence infinite Gödel numbers associated with it. We need the function \(\mathrm{g}\) to map each computable number to one Gödel number. Therefore, we choose the number with the least magnitude, which corresponds to the shortest program that produces the computable number in question. A critic may argue that finding this shortest program is not always possible, and so the Gödel function \(\mathrm{g}\) cannot be executed. However, for our purposes it does not matter if the function is practical. All that matters, to prove that the set of computable numbers is countable, is that each one can be shown to have an associated Gödel number.

Here’s the final proof, bringing this all together. The computable numbers are an infinite set. We have provided an injective function \(\mathrm{g}\) that maps every computable number to a single natural number: a Gödel number. Any set with such a function is countable, and therefore computable numbers are countable.

A counter proof?

Some readers may wonder why we cannot use the same diaganolization trick that Cantor used, and so prove that the computable numbers are uncountable. Let’s try to do it, and we will see where it fails.

Let us say that we have an allegedly complete (though infinite) enumeration of every computable number. We will now attempt to use diagonalization in order to produce a new computable number \(x\), that could not possibly be in our list. First we calculate the tenths place digit of the first computable number, and assign \(x\)’s tenths place digit to be something else. Then we calculate the hundredths place digit of the second computable number, and assign \(x\)’s corresponding digit to be another one. We proceed as before, until we have computed a number that could not possibly be in the enumeration. Thus, it appears that we have a contradiction, and so \(\mathbb{K}\) appears uncountable.

The problem with this “proof” is that we never showed that the number \(x\) is itself computable. When Cantor used diagonalization to prove \(\mathbb{R}\) uncountable, the resulting \(x\) was obviously a real number, even though it’s not in the “complete” list. Hence he had a contradiction. But proving that a number is computable is more difficult, because we would need to show that the computation to some arbitrary accuracy can be done in finite time. Since we haven’t done this (and in fact we cannot do it), the alleged proof falls apart.

It might seem straightforward at first to prove that \(x\) is computable, since the diagonalization technique appears to give an algorithm to calculate it. However the algorithm is flawed, and will enter an infinite loop from which it cannot recover. Remember that it calculates \(x\) by using each computable number in turn, which would include \(x\) itself. Let us say that \(x\) happens to be the \(i\)th number in the list. Then, in order to calculate \(x\) to the \(i\)th digit, one must first calculate \(x\) to the \(i\)th digit. The algorithm enters an infinite regress from which it cannot recover. Thus the computation fails: \(x\) is not computable, there is no contradiction, and diagonalization technique did not work.

Conclusions

What does it mean that the set of computable numbers is countable, but the set of real numbers is not? In a very true sense, the set \(\mathbb{K}\) contains all the numbers that are knowable. Real numbers that aren’t computable can’t be calculated, or even thought about except in the most abstract manner (such as \(x\) from the last section, the uncalculable result of using the diagonalization process on \(\mathbb{K}\)). Yet, since \(\mathbb{R}\) is uncountable, the vast majority of real numbers must be unknowable. They are the infinite streams of digits, with absolutely no rhyme or reason to their sequences. Any real number used in day-to-day life is part of the computable set \(\mathbb{K}\).

Nested sets showing natural numbers contained in the integers contained in the rationals contained in the algebraic real numbers contained in the computable numbers contained in the reals
Venn diagram showing the relationships between subsets of real numbers. All sets shown are countably infinite except the reals. Noncomputable real numbers exist, but they are inherently unknowable.

Others have gone on to describe other incomputable numbers. For example, in 1949 the Swiss mathematician Ernst Specker found a monotonically increasing, bounded, computable sequence of rational numbers whose least upper bound is not computable. Computability theory originated with Turing’s paper as a branch of mathematics in its own right. It attempts to classify noncomputable functions into a hierarchy, based on their degrees of noncomputability. It is of interest to both mathematicians and computer scientists.

Fortunately for Alan Turing, his Turing machines were thought to be innovative enought that his paper found publication, despite not being the first to prove that the Entscheidungsproblem is not solvable. In fact, his paper served as his introduction to Alonzo Church, the very man who had scooped him. Church would go on to supervise Turing’s doctoral work at Princeton, from which he graduated in 1938 at the age of 26! The two would go on to lay much of the groundwork for modern computer science, including the Church-Turing thesis (which roughly states that human computation, machine computation, and Church’s lambda calculus are all equivalent).

Turing’s paper doesn’t mention complex numbers, but an extension of his work is straightforward. A complex computable number would be a complex number whose real portion is a computable number, and whose imaginary portion is another computable number times \(i\) \((\sqrt{-1})\). Such a number is computable to within an arbitrary precision in a finite amount of time, so long as the program that calculates it alternates regularly between calculating the real portion and the imaginary portion (perhaps outputting the two numbers via separate streams or into separate files). Since such a complex number can be calculated using a program, the same proof of countability applies.

Charles Petzold’s “The Annotated Turing” is an excellent source for those who wish to know more about Turing and his proof that the Entscheidungsproblem is impossible. It reproduces Turing’s entire paper one small piece at a time, interspersed with copious explanatory detail and context. If you wish to know more about Turing’s seminal work, this is without a doubt the best place to start. Those more interested in Turing’s personal life should read Andrew Hodges’ “Alan Turing: the Enigma”. This covers his whole life in great detail, from his early childhood, through his codebreaking efforts during the war years, and ending with his tragic persecution and death just shy of his 42nd birthday.

Turing’s idea of computable numbers form a logical and intuitive extension to Cantor’s original work on countability. His work is a unique bridge between mathematics and computer science, that is still relevant today.

Acknowledgements

Thank you to David Bau for permission to use his implementation of the \(\pi\) spigot algorithm. Thank you also to Ursula Whitcher for encouragement and assistance with one of the proofs.

References

  • D. Bau. A dabbler’s weblog: Python pi.py spigot. http://davidbau.com/archives/2010/03/14/python_pipy_spigot.html, March 2010.
  • G. Cantor. Über eine Eigenschaft des Inbegriffs aller reellen algebraischen Zahlen (On a property of the collection of all real algebraic numbers). Journal für die reine und angewandte Mathematik, 77:258–262, 1874.
  • G. Cantor. Über eine elementare Frage der Mannigfaltigkeitslehre (On an elementary question concerning the theory of manifolds). Jahresbericht der Deutschen Mathematiker-Vereinigung, 1, 1892.
  • A. Church. A note on the Entscheidungsproblem. The Journal of Symbolic Logic, 1(01):40–41, 1936.
  • J. Gibbons. Unbounded spigot algorithms for the digits of pi. The American Mathematical Monthly, 113(4):pp. 318–328, 2006.
  • K. Gödel. Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I (On formally undecidable propositions of Principia Mathematica and related systems I). Monatshefte für Mathematik und Physik, 38(1):173–198, 1931.
  • A. Hodges. Alan Turing: the Enigma. Burnett Books, London, 1983.
  • C. Petzold. The Annotated Turing: A Guided Tour Through Alan Turing’s Historic Paper on Computability and the Turing Machine. Wiley Publishing, 2008.
  • E. Specker. Nicht konstruktiv beweisbare Sätze der Analysis (Theorems of analysis that cannot be proven constructively). The Journal of Symbolic Logic, 14:145–158, 1949.
  • A. M. Turing. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 42, 1936.
  • A. M. Turing. Systems of logic based on ordinals. PhD thesis, Princeton University, 1938.

What is a prime, and who decides?

Some people view mathematics as a purely platonic realm of ideas independent of the humans who dream about those ideas. If that’s true, why can’t we agree on the definition of something as universal as a prime number?

Courtney R. Gibbons
Hamilton College

Introduction

Scene: It’s a dark and stormy night at SETI. You’re sitting alone, listening to static on the headphones, when all of the sudden you hear something: two distinct pulses in the static. Now three. Now five. Then seven, eleven, thirteen — it’s the sequence of prime numbers! A sequence unlikely to be generated by any astrophysical phenomenon (at least, so says Carl Sagan in Contact, the novel from which I’ve lifted this scene) — in short, proof of alien intelligence via the most fundamental mathematical objects in the universe…

SETI's very big array
bzz bzz, bzz bzz bzz, bzz bzz bzz bzz bzz, …

Hi! I’m Courtney, and I’m new to this column. I’ve been enjoying reading my counterparts’ posts, including Joe Malkevitch’s column Decomposition and David Austin’s column Meet Me Up in Space. I’d like to riff on those columns a bit, both to get to some fun algebra (atoms and ideals!) and to poke at the idea that math is independent of our humanity.

Introduction, Take 2

Scene: It’s a dark and stormy afternoon in Clinton, NY. I’m sitting alone at my desk with two undergraduate abstract algebra books in front of me, both propped open to their definitions of a prime number…

  • Book A says that an integer $p$ (of absolute value at least 2) is prime provided it has exactly two positive integer factors. Otherwise, Book A says $p$ is composite.
  • Book B says that an integer $p$ (of absolute value at least 2) is prime provided whenever it divides a product of integers, it divides one of the factors (in any possible factorization). Otherwise, Book B says $p$ is composite.

Note: Book A is Bob Redfield’s Abstract Algebra: A Concrete Introduction (Bob is my predecessor at Hamilton College). Book B Abstract Algebra: Rings, Groups, and Fields by Marlow Anderson (Colorado College; Marlow was my undergraduate algebra professor) and Todd Feil (Denison University).

I reached for the nearest algebra textbook to use as a tie-breaker, which happened to be Dummit and Foote’s Abstract Algebra, only to find that the authors hedge their bets by providing Book A’s definition and then saying, well, actually, Book B’s definition can be used to define prime, actually. Yes, it’s a nice exercise to show these definitions are equivalent. I can’t help but wonder, though: which is what it really is to be prime, and which is merely a consequence of that definition?

Who Decides?

Some folks take the view that math is a true and beautiful thing and we humans merely discover it. This seems to me to be a way of saying that math is independent of our humanity. Who we are, what communities we belong to — these don’t have any effect on Mathematics, Platonic Realm of Pure Ideas. To quantify this position as one might for an intro to proofs class: For each mathematical idea $x$, $x$ has a truth independent of humanity. And yet, two textbooks fundamental to the undergraduate math curriculum are sitting here on my desk with the audacity to disagree about the very definition of arguably the most pure, most platonic, most absolutely mathematical phenomenon you could hope to encounter: prime numbers!

This isn’t a perfect counterexample to the universally quantified statement above (maybe one of these books is wrong?). But in my informal survey of undergraduate algebra textbooks (the librarians at Hamilton really love me and the havoc I wreak in the stacks!), there’s not exactly a consensus on the definition of a prime!

Two stacks of books
On the left, books with definitions that agree with Book A. On the right, books with definitions that agree with Book B. On top, a Polish mood cube showing dejection in Springer yellow. Not pictured, books that fell on the floor while surveying my office shelves.

As far as I can tell, the only consensus is that we shouldn’t consider $-1$, $0$, or $1$ to be prime numbers.

A researcher says "Good news, there's intelligent life in the universe. Wait, bad news, they think 1 is prime."

But, uh, why not?! In the case of $0$, it breaks both definitions. You can’t divide by zero (footnote: well, you shouldn’t divide by if you want numbers to be meaningful, which is, of course, a decision that someone made and that we continue to make when we assert “you can’t divide by zero”), and zero has infinitely many positive integer factors. But when $\pm 1$ divides a product, it divides one (all!) of the factors. And what’s so special about exactly two positive divisors anyway? Why not “at most two” positive divisors?

Well, if you’re reading this, you probably have had a course in algebra, and so you know (or can be easily persuaded, I hope!) that the integers have a natural (what’s natural is a matter of opinion, of course) algebraic analog in a ring of polynomials in a single variable with coefficients from a field $F$. The resemblance is so strong, algebraically, that we call $F[x]$ an integral domain (“a place where things are like integers” is my personal translation). The idea of prime, or “un-break-down-able”, comes back in the realm of polynomials, and Book A and Book B provide definitions as follow:

  • Book B says that a nonconstant polynomial $p(x)$ is irreducible provided the only way it factors is into a product in which one of the factors must have degree 0 (and the other necessarily has the same degree as $p(x)$). Otherwise, Book B says $p(x)$ is reducible.
  • Book A says that a nonconstant polynomial $p(x)$ is irreducible provided whenever $p(x)$ divides a product of polynomials in $F[x]$, it divides one of the factors. Otherwise, Book A says $p(x)$ is reducible.

Both books agree, however, that a polynomial is reducible if and only if it has a factorization that includes more than one irreducible factor (and thus a polynomial cannot be both reducible and irreducible).

Notice here that we have a similar restriction: the zero polynomial is excluded from the reducible/irreducible conversation, just as the integer 0 was excluded from the prime/composite conversation. But what about the other constant polynomials? They satisfy both definitions aside from the seemingly artificial caveat that they’re not allowed to be irreducible!

Well, folks, it turns out that in the integers and in $F[x]$, if you’re hoping to have meaningful theorems (like the Fundamental Theorem of Arithmetic or an analog for polynomials, both of which say that factorization into primes/irreducibles is unique up to a mild condition), you don’t want to allow things with multiplicative inverses to be among your un-break-down-ables! We call elements with multiplicative inverses units, and in the integers, $(-1)\cdot(-1) = 1$ and $1\cdot 1 = 1$, so both $-1$ and $1$ are units (they’re the only units in the integers).

In the integers, we want $6$ to factor uniquely into $2\cdot 3$, or, perhaps (if we’re being generous and allowing negative numbers to be prime, too) into $(-2)\cdot(-3)$. This generosity is pretty mild: $2$ and $-2$ are associates, meaning that they are the same up to multiplication by a unit. One statement of the Fundamental Theorem of Arithmetic is that every integer (of absolute value at least two) is prime or factors uniquely into a product of primes up to the order of the factors and up to associates. That means that the list prime factors (up to associates) that appear in the factorization of $6$ is an invariant of $6$, and the number of prime factors (allowing for repetition) in any factorization of $6$ is another invariant (and it’s well-defined). Let’s call it the length of $6$.

But if we were to let $1$ or $-1$ be prime? Goodbye, fundamental theorem! We could write $6 = 2\cdot 3$, or $6 = 1\cdot 1\cdot 1 \cdots 1 \cdot 2 \cdot 3$, or $6 = (-1)\cdot (-2) \cdot 3$. We have cursed ourselves with the bounty of infinitely many distinct possible factorizations of $6$ into a product primes (even accounting for the order of the factors or associates), and we can’t even agree on the length of $6$. Or $2$. Or $1$.

The skeptical, critical-thinking reader has already been working on workarounds. Take the minimum number of factors as the length. Write down the list of prime factors without their powers. Keep the associates in the list (or throw them out, but at that point, just agree that $1$ and $-1$ shouldn’t be prime!). But in the polynomial ring $F[x]$, dear reader, every nonzero constant polynomial is a unit: given $p(x) = a$ for some nonzero $a \in F$, the polynomial $d(x) = a^{(-1)}$ is also in $F[x]$ since $a^{(-1)}$ is in the field $F$, and $p(x)d(x) = 1$, the multiplicative identity in $F[x]$. So, if you allow units to be irreducible in $F[x]$, now even an innocent (and formerly irreducible) polynomial like $x$ has infinitely many factorizations into things like ($a)(1/a)(b)(1/b)\cdots x$. So much for those workarounds!

So, since we like our Fundamental Theorems to be neat, tidy, and useful, we agree to exclude units from our definitions of prime and composite (or irreducible and reducible, or indecomposable and decomposable, or…).

More Consequences (or Precursors)

Lately I’ve been working on problems related to semigroups, by which I mean nonempty sets equipped with an associative binary operation — and I also insist that my semigroups be commutative and have a unit element. In the study of factorization in semigroups, the Fundamental Theorem of Arithmetic leads to the idea of counting the distinct factors an element can have in any factorization into atoms (the semigroup equivalent of irreducible/prime elements; these are elements $p$ that factor only into products involving units and associates of $p$).

One of my favorite (multiplicative) semigroups is $\mathbb{Z}[\sqrt{-5}] = \{a + b \sqrt{-5} \, : \, a,b \in \mathbb{Z}\}$, favored because the element $6$ factors distinctly into two different products of irreducibles! In this semigroup, $6 = 2\cdot 3$ and $6 = (1+\sqrt{-5})(1-\sqrt{-5})$. It’s a nice exercise to show that $1\pm \sqrt{-5}$ are not associates of $2$ or $3$, yielding two distinct factorizations into atoms!

While we aren’t lucky enough to have unique factorization, at least we have that the number of irreducible factors in any factorization of $6$ is always two. That is, excluding units from our list of atoms leads to an invariant of $6$ in the semigroup $\mathbb{Z}[\sqrt{-5}]$.

 

On planet Blargesnort, creatures have 1+sqrt(-5) digits.

Anyway, without the context of more general situations like this semigroup (and I don’t know, is $\mathbb{Z}[\sqrt{-5}]$ one of those platonically true things, or were Gauss et al. just really imaginative weirdos?), would we feel so strongly that $1$ is not a prime integer?

Still More Consequences (or precursors)

Reminding ourselves yet again that the integers form a ring under addition and multiplication, we might be interested in the ideals generated by prime numbers. (What’s an ideal? It’s a nonempty subset of the ring closed under addition, additive inverses, and scalar multiplication from the ring.) We might even call those ideals prime ideals, and then generalize to other rings! The thing is, if we do that, we end up with this definition:

(Book A and B agree here:) An ideal $P$ is prime provided $xy \in P$ implies $x$ or $y$ belongs to $P$.

But in the case of the integers — a principal ideal domain! — that means that a product $ab$ belongs to the principal ideal generated by the prime $p$ precisely when $p$ divides one of the factors.

From the perspective of rings, every (nonzero) ring has two trivial ideals: the ring $R$ itself (and if $R$ has unity, then that ideal is generated by $1$, or any other unit in $R$) and the zero ideal (generated by $0$). If we want the study of prime ideals to be the study of interesting ideals, then we want to exclude units from our list of potential primes. And once we do, we recover nice results like an ideal $P$ is prime in a commutative ring with unity if and only if $R/P$ is an integral domain.

Conclusions

I still have two books propped open on my desk, and after thinking about semigroups and ideals, I’m no closer to answering the question “But what is a prime, really?” than I was at the start of this column! All I have is some pretty good evidence that we, as mathematicians, might find it useful to exclude units from the prime-or-composite dichotomy (I haven’t consulted with the mathematicians on other planets, though). To me, that evidence is a reminder that we are constantly updating our mathematics framework in reference to what we learn as we do more math.  We look back at these ideas that seemed so solid when we started — something fundamentally indivisible in some way — and realize that we’re making it up as we go along. (And ignoring a lot of what other humans consider math, too, as we insist on our axioms and law of the excluded middle and the rest of the apparatus of “modern mathematics” while we’re making it up…)

And the math that gets done, the math that allows us to update our framework… Well, that depends on what is trendy/fundable/publishable, who is trendy/fundable/publishable, and who is making all of those decisions. Perhaps, on planet Blarglesnort, math looks very different.

References

Anderson, Marlow; Feil, Todd. A first course in abstract algebra. Rings, groups, and fields. Third edition. ISBN: 9781482245523.

Dummit, David S.; Foote, Richard M. Abstract algebra. Third edition. ISBN: 0471433349.

Redfield, Robert. Abstract algebra. A concrete introduction. First edition. ISBN: 9780201437218.

Geroldinger, Alfred; Halter-Koch, Franz. Non-unique factorizations. Algebraic, combinatorial and analytic theory. ISBN: 9781584885764.

Decomposition

Mathematics too has profited from the idea that sometimes things of interest might have a structure which allowed them to be decomposed into simpler parts…

Joe Malkevitch
York College (CUNY)

Introduction

One way to get insights into something one is trying to understand better is to break the thing down into its component parts, something simpler. Physicists and chemists found this approach very productive—to understand water or salt it was realized that common table salt was sodium chloride, a compound made of two elements, sodium and chlorine and that water was created from hydrogen and oxygen. Eventually, many elements (not the Elements of Euclid!) were discovered. The patterns noticed in these building block elements lead to the theoretical construct called the periodic table, which showed that various elements seemed to be related to each other. The table suggested that there might be elements which existed but had not been noticed; the "holes" in the table were filled when these elements were discovered, sometimes because missing entries were sought out. The table also suggested "trans-uranium" elements, which did not seem to exist in the physical world but could be created, and were created, in the laboratory. These new elements were in part created because the periodic table suggested approaches as to how to manufacture them. The work done related to understanding the structure of the periodic table suggested and lead to the idea that elements were also made up of even smaller pieces. This progression of insight lead to the idea of atoms, and that atoms too might have structure lead to the idea of subatomic particles. But some of these "fundamental" particles could be decomposed into smaller "parts." We now have a zoo of quarks and other "pieces" to help us understand the complexities of the matter we see in the world around us.

Shiny crystals of the element gallium



Crystals of gallium, an element whose existence was predicted using the periodic table. Photo by Wikipedia user Foobar, CC BY-SA 3.0.

Prime patterns

Mathematics too has profited from the idea that sometimes things of interest might have a structure which allowed them to be decomposed into simpler parts. A good example is the number 111111111. It is an interesting pattern already, because all of its digits are 1’s when written in base 10. We could compare 111111111 with the number that it represents when it is interpreted in base 2 (binary)—here it represents 511. But it might be interesting to study any relation between numbers with all 1’s as digits and compare them to numbers in other bases, not only base 2! Mathematics grows when someone, perhaps a computer, identifies a pattern which can be shown to hold in general, rather than for the specific example that inspired the investigation. A number of the form 1111….1 is called a repunit. Can we find interesting patterns involving repunits?

One approach to decomposing a number (here strings of digits are to be interpreted as being written in base 10) is to see if a number can be written as the product of two other numbers different from 1 and itself. For example, 17 can only be written as the product of the two numbers 17 and 1. On the other hand, 16 can be written as something simpler, as $2 \times 8$ but there are "simpler" ways to write 16, as $4 \times 4$, and since 4 can be decomposed as $2 \times 2$ we realize that 16 can be written as $2 \times 2 \times 2 \times 2$. Seeing this pattern, mathematicians are trained to ask questions, such as whether numbers which are all the products of the same number which cannot be broken down have any special interest. But we are getting ahead of ourselves here. What are the "atoms" of the multiplicative aspect of integers? These are the numbers called the primes, 2, 3, 5, 7, 11, … Notice that 11 is also a repunit. This takes us back to the idea that there might be many numbers of the form 11111…….111111 that are prime! Are there infinitely many primes all of whose digits in their base 10 representation are one? Answer!! No one knows. But it has been conjectured that there might be infinitely many repunit primes. Note that numbers like 111, 111111, 111111111, … where the number of digits is a multiple of 3, can’t be prime.

Note 11 base 2 is 3, which is also prime. Similarly, 1111111111111111111 is prime, and 1111111111111111111 base 2 represents the number 524287, which is also prime. If a repunit is prime, must the decimal number it represents treated as a base 2 number be prime?

By looking for "parts" that were building blocks for the integers, mathematics has opened a rich array of questions and ideas many of which have spawned major new mathematical ideas both theoretical and applied. Having found the notion of prime number as a building block of the positive number system, there are natural and "unnatural" questions to ask:

1. Are there infinitely many different primes?

2. Is there a "simple" function (formula) which generates all of the primes, or if not all primes, only primes?

While the fact that there are infinitely many primes was already known by the time of Euclid, the irregularity of the primes continues to be a source of investigations to this day. Thus, the early discovered pattern that there seemed to be pairs of primes differing by two (e.g. 11 and 13, 41 and 43, 139 and 141), which lead to the "guess" that perhaps there are infinitely many numbers of the form $p$ and $p+2$ that are both primes (known as twin primes) is still unsolved today. While more and more powerful computers made possible finding larger and larger twin prime pairs, no one could find a proof of the fact that there might be infinitely many such pairs. There were attempts to approach this issue via a more general question. Were there always primes which were some fixed bound of numbers apart? Little progress was made on this problem until in 2013 a mathematician whose name was not widely known in the community showed that there was a large finite uniform bound on a fixed size gap which could appear infinitely many times. This work by Yitang Zhang set off a concerted search to improve his methods and alter them in a way to get better bounds for the size of this gap. While Zhang’s breakthrough has been improved greatly, the current state of affairs is still far from proving that the twin-prime conjecture is true.

Photo of Yitang Zhang

Photo of Yitang Zhang. Courtesy of Wikipedia.

Mathematical ideas are important as a playground in which to discover more mathematical ideas, thus enriching our understanding of mathematics as an academic subject and sometimes making connections between mathematics and other academic subjects. Today there are strong ties between mathematics and computer science, an academic subject that did not even exist when I was a public school student. Mathematics can be applied in ways that not long ago could not even be imagined, no less carried out. Who would have thought that the primes would help make possible communication that prevents snooping by others as well as protecting the security of digital transactions? From ancient times, codes and ciphers were used to make it possible to communicate, often in military situations, so that should a communication fall into enemy hands it would not assist them. (Codes involve the replacement of words or strings of words with some replacement symbol(s), while ciphers refer to replacing each letter in an alphabet with some other letter in order to disguise the meaning of the original text.) Human ingenuity has been remarkable in developing clever systems for carrying out encryption of text rapidly and has allowed the receiver to decrypt the message in a reasonable amount of time but would, at the very least, slow down an enemy who came into possession of the message. But the development of calculators and digital computers made it harder to protect encrypted messages, because many systems could be attacked by a combination of brute force (try all possible cases) together with using ideas about how the design of the code worked. There was also the development of statistical methods based on frequency of letters and/or words used in particular languages that were employed to break codes. You can find more about the interactions between mathematics, ciphers, and internet security in the April 2006 Feature Column!



Earlier we looked at "decomposing" numbers into their prime parts in a multiplication of numbers setting. Remarkably, a problem about decomposing numbers under addition has stymied mathematics for many years, despite the simplicity of stating the problem. The problem is named for Christian Goldbach (1690-1764).

Letter from Euler to Goldbach

Letter from Goldbach to Euler asking about what it is now known as Goldbach’s Conjecture. Image courtesy of Wikipedia.

Goldbach’s Conjecture (1742)

Given an even positive integer $n$, $n$ can be written as the sum of two primes.

For example, 10 = 3 +7 or also 5 +5, 20 = 3 + 17, 30 = 11 + 19. We allow the primes to be either the same or different in the decomposition. While computers have churned out larger and larger even numbers for which the conjecture is true, the problem is still open after hundreds of years.

What importance should one attach to answering a particular mathematical question? This is not an easy issue to address. Some mathematical questions seem to be "roadblocks" to getting insights into what seem to be important questions in one area of mathematics and in some cases answering a mathematical question seems to open doors on many mathematical issues. Another measure of importance might be in terms of aesthetic properties of a particular mathematical result. The aesthetic may be from the viewpoint that something seems surprising or unexpected or the aesthetic may be that a result seems to have "beauty"—a trait that whether one is talking about beautiful music, fabrics, poems etc. seems to differ greatly from one person to the next. It is hard to devise an objective yardstick for beauty. Another scale of importance is the "value" of a mathematical result to areas of knowledge outside of mathematics. Some results in mathematics have proved to be insightful in many academic disciplines like physics, chemistry, biology but other mathematics seems only to be relevant to mathematics itself. What seems remarkable is that over and over again mathematics that seemed only to have value within mathematics itself or to be only of "theoretical" importance, has found use outside of mathematics. Earlier I mentioned some applications of mathematical ideas to constructing ciphers to hide information. There are also codes designed to correct errors in binary strings and to compress binary strings. Cell phones and streaming video use these kinds of ideas: it would not be possible to have the technologies we now have without the mathematical ideas behind error correction and data compression.

Partitions

The word decompose has some connotations in common with the word partition. Each of these words suggests breaking up something into pieces. Often common parlance guides the use of the technical vocabulary that we use in mathematics, but in mathematics one often tries to be very careful to be precise in what meaning one wants a word to have. Sometimes in popularizing mathematics this attempt to be precise is the enemy of the public’s understanding the mathematics involved. Sometimes precise words are used to define a concept which are mathematically precise but obscure the big picture of what the collection of ideas/concepts that are being defined precisely are getting at. Here I try to use "mathematical terminology" to show the bigger picture of the ideas involved.

Given a positive integer $n$, we can write $n$ as a sum of positive integers in different ways. For example, $3 = 3$, $3 = 2+1$, and $3 = 1 + 1 + 1$. In counting the number of decompositions possible, I will not take order of the summands into account—thus, $1 +2$ and $2 +1$ will be considered the same decomposition. Each of these decompositions is considered to be a partition of 3. In listing the partitions of a particular number $n$, it is common to use a variant of set theory notation where the entries in set brackets below can be repeated. Sometimes the word multiset is used to generalize the idea of set, so that we can repeat the same element in a set. Thus we can write the partitions of three as $\{3\}$, $\{2,1\}$, $\{1,1,1\}$. A very natural question is to count how many different partitions there are of $n$ for a given positive integer. You can verify that there are 5 partitions of the number 4, and 11 partitions of the number 5. Although for very large values of $n$ the number of partitions of $n$ has been computed, there is no known formula which computes the number of partitions of $n$ for a given positive integer $n$. Sometimes the definition of partition insists that the parts making up the partition be listed in a particular order. It is usual to require the numbers in the partition not to increase as they are written out. I will use this notational convention here: The partitions of 4 are: $\{4\}$, $\{3,1\}$, $\{2, 2\}$, $\{2, 1, 1\}$, $\{1,1,1,1\}$. Sometimes in denoting partitions with this convention exponents are used to indicate runs of parts: $4$; $3,1$; $2^2$; $2, 1^2$; $1^4$. The notation for representing partitions varies a lot from one place to another. In some places for the partition of 4 consisting of $2 + 1 + 1$ one sees $\{2,1,1\}$, $2+1+1$, $211$ or $2 1^2$ and other variants as well! It may be worth noting before continuing on that we have looked at partitions of $n$ in terms of the sum of smaller positive integers but there is another variant that leads in a very different direction. This involves the partition of the set $\{1,2,3,\dots,n\}$ rather than the partition of the number $n$. In this framework the partition of a set $S$ consists of a set of non-empty subsets of the set $S$ whose union is $S$. (Remember that the union of two sets $U$ and $V$ lumps together the elements of $U$ and $V$ and throws away the duplicates.)

Example: Partition the set $\{1,2,3\}$:

Solution:

$$\{1,2,3\}, \{1,2\} \cup \{3\}, \{1, 3\} \cup \{2\}, \{2,3\} \cup \{1\}, \{1\} \cup \{2\} \cup {3}$$

While there are 3 partitions of the number 3 there are 5 partitions of the set $\{1,2,3\}$. The number of partitions of $\{1,2,3,\dots,n\}$ are counted by the Bell numbers, developed by Eric Temple Bell (1883-1960). While the "standard" name for these numbers now honors Bell, other scholars prior to Bell also studied what today are known as the Bell numbers, including the Indian mathematician Srinivasa Ramanujan (1887-1920).

Sketch of E.T. Bell



A sketch of Eric Temple Bell. Courtesy of Wikipedia.



Partitions have proved to be a particularly intriguing playground for studying patterns related to numbers and have been used to frame new questions related to other parts of mathematics. When considering a partition of a particular number $n$, one can think about different properties of the entries in one of the partitions:

  • How many parts are there?
  • How many of the parts are odd?
  • How many of the parts are even?
  • How many distinct parts are there?

For example, for the partition $\{3, 2, 1, 1\}$, this partition has 4 parts, the number of odd parts is 3, the number of even parts 1, and the number of distinct parts is 3.

Closely related to partitions is using diagrams to represent partitions. There are various versions of these diagrams, some with dots for the entries in the partition and others with cells where the cell counts in the rows are the key to the numbers making up the partition.

Thus for the partition $3+2+1$ of 6 one could display this partition in a visual way:

X X X

X X

X

There are various conventions about how to draw such diagrams. One might use X’s as above but traditionally dots are used or square cells that abut one another.

These are known as Ferrers’s (for Norman Macleod Ferrers, 1829-1903) diagrams or sometimes tableaux, or Young’s Tableaux. The name honors Alfred Young (1873-1940). Young was a British mathematician and introduced the notion which bears his name in 1900.

Norman Ferrers



Norman Ferrers. Image courtesy of Wikipedia.

The term Young’s Tableau is also used for diagrams such as the one below where numbers chosen in various ways are placed inside the cells of the diagram.

Ferrers diagram, cells

A representation of the partition 10 with parts 5, 4, 1



Ferrers diagram, dots

A representation of the partition of 11 ($\{5,3,2,1\}$) using rows of dots.



While these diagrams show partitions of 10 and 11 by reading across the rows, one also sees that these diagrams display partitions of 10, and 11, namely, 3,2,2,1 and 4,3,2,1,1 respectively, by reading in the vertical direction rather than the horizontal direction. Thus, each Ferrers’s diagram gives rise to two partitions, which are called conjugate partitions. Some diagrams will read the same in both the horizontal and vertical directions; such partitions are called self-conjugate. Experiment to see if you can convince yourself that the number of self-conjugate partitions of $n$ is the same as the number of partitions of $n$ with odd parts that are all different!

The next figure collects Ferrers’s diagrams for the partitions of small integers.



Diagram of partitions of 1 to 8



Ferrers’s diagrams of partitions of the integers starting with 1, lower right, and increasing to partitions of 7. Courtesy of Wikipedia.



Often to get insights into mathematical phenomena one needs data. Here for example, complementing the previous figure, is a table of the ways to write the number $n$ as the sum of $k$ parts. For example, 8 can be written as a sum of two parts in 4 ways. These are the partitions of 8 which have two parts: $7+1$, $6+2$, $5+3$, and $4+4$.

$n$ into $k$ parts 1 2 3 4 5 6 7 8
1 1              
2 1 1            
3 1 1 1          
4 1 2 1 1        
5 1 2 2 1 1      
6 1 3 3 2 1 1    
7 1 3 4 3 2 1 1  
8 1 4 5 5 3 2 1 1
Fill in next row!!                


Table: Partition of the number $n$ into $k$ parts

While many people have contributed to the development of the theory of partitions, the prolific Leonhard Euler (1707-1783) was one of the first.

Portrait of Euler



Leonhard Euler. Image courtesy of Wikipedia.

Euler was one of the most profound contributors to mathematics over a wide range of domains, including number theory, to which ideas related to partitions in part belongs. Euler showed a surprising result related to what today are called figurate numbers. In particular he discovered a result related to pentagonal numbers. Euler was fond of using power series (a generalized polynomial with infinitely many terms) which in the area of mathematics dealing with counting problems, combinatorics, is related to generating functions. If one draws a square array of dots, one sees $1, 4, 9, 16, \dots$ dots in the pattern that one draws. What happens when one draws triangular, pentagonal, or hexagonal arrays of dots? In the next two figures, we see a sample of the many ways one can visualize the pentagonal numbers: $1, 5, 12, 22, \dots$

Diagram of pentagonal numbers Diagram of pentagonal numbers

(a)                                             (b)



Two ways of coding the pentagonal numbers. Courtesy of Wikipedia.



Diagram of pentagonal numbers



The pentagonal numbers for side lengths of the pentagon from 2 to 6. Courtesy of Wikipedia.



Godfrey Harold Hardy (1877-1947), Ramanujan and in more modern times, George Andrews and Richard Stanley have been important contributors to a deeper understanding of the patterns implicit in partitions and ways to prove that the patterns that are observed are universally correct.

Photo of G.H. Hardy





Photo of G. H. Hardy. Photo courtesy of Wikipedia.



Ramanujan



Srinivasa Ramanujan. Image courtesy of Wikipedia.

Photo of George Andrews

Photo of George Andrews. Courtesy of Wikipedia.

Photo of Richard Stanley



Photo of Richard Stanley. Courtesy of Wikipedia.

What is worth noting here is that the methodology of mathematical investigations is both local and global. When one hits upon the idea of what can be learned by “decomposing’ something one understands in the hope of getting a deeper understanding, it also has implications in other environments where one uses the broader notion (decomposition). So understanding primes as building blocks encourages one to investigate primes locally in the narrow arena of integers but also makes one think about other kinds of decompositions that might apply to integers. We are interested in not only decompositions of integers from a multiplicative point of view but also decompositions of integers from an additive point of view. Here in a narrow sense one sees problems like the Goldbach Conjecture but in a broader sense it relates to the much larger playground of the partitions of integers. When one develops a new approach to looking at a situation (e.g. decomposing something into parts) mathematicians invariably try to "export" the ideas discovered in a narrow setting to something more global, including areas of mathematics that are far from where the original results were obtained. So if decomposition is useful in number theory, why not try to understand decompositions in geometry as well? Thus, there is a whole field of decompositions dealing with plane polygons, where the decompositions are usually called dissections.

As an example of a pattern which has been discovered relatively recently and which illustrates that there are intriguing mathematical ideas still to be discovered and explored, consider this table:



Partition Distinct elements Number 1’s
4 1 0
3+1 2 1
2+2 1 0
2+1+1 2 2
1+1+1+1 1 4
Total 7 7





See anything interesting—some pattern? Not all of the column entries are odd or even in the second and third columns. However, Richard Stanley noted that the sum of the second and third columns are equal! Both add to 7. And this is true for all values of $n$. How might one prove such a result? One approach would be to find a formula (function involving $n$) for the number of distinct elements in the partitions of $n$ and also find a formula for the number of 1’s in the partitions of $n$. If these two formulas are the same for each value of $n$, then it follows that we have a proof of the general situation that is illustrated for the example $n = 4$ in the table above. However, it seems unlikely that there is a way to write down closed form formulas for either of these two different quantities. However, there is a clever approach to dealing with the observation above that is also related to the discovery of Georg Cantor (1845-1918) that there are sets with different sizes of infinity.

Consider the two sets, $\mathbb{Z}^+$ the set of all positive integers and the set $\mathbb{E}$ of all of the even positive integers. $\mathbb{Z}^+ = \{1,2,3,4,5, \dots\}$ and $\mathbb{E}={2,4,6,8,10,\dots}$. Both of these are infinite sets. Now consider the table:

1 paired with 2

2 paired with 4

3 paired with 6

4 paired with 8

.

.

.

Note that each of the entries in $\mathbb{Z}^+$ will have an entry on the left in this "count" and each even number, the numbers in $\mathbb{E}$, will have an entry on the right in this "count." This shows that there is a one to one and onto way to pair these two sets, even though $\mathbb{E}$ is a proper subset of $\mathbb{Z}^+$ in the sense that every element of $\mathbb{E}$ appears in $\mathbb{Z}^+$ and there are elements in $\mathbb{Z}^+$ that don’t appear in $\mathbb{E}$. There seems to be a sense in which $\mathbb{E}$ and $\mathbb{Z}^+$ have the same "size." This strange property of being able to pair elements of a set with a proper subset of itself can only happen for an infinite collection of things. Cantor showed that in this sense of size, often referred to as the cardinality of a set, some pairs of sets which seemed very different in size had the same cardinality (size). Thus, the set of positive integers Cantor showed had the same cardinality as the set of positive rational numbers (numbers of the form $a/b$ where $a$ and $b$ are positive integers with no common factor for $a$ and $b$). Remarkably, he was also able to show that the set of positive integers had a different cardinality from the set of real numbers. To this day there are questions dating back to Cantor’s attempt to understand the different sizes that infinite sets can have that are unresolved.

What many researchers are doing for old and new results about partitions is to show that counting two collections, each defined differently but for which one gets the same counts, the equality of the counts can be shown by constructing a bijection between the two different collections. When such a one-to-one and onto correspondence (function) can be shown for any value of a positive integer n, then the two collections of things must have the same size. Such bijective proofs often show more clearly the connection between seemingly unrelated things rather than showing that the two different concepts can be counted with the same formula. These bijective proofs often help generate new concepts and conjectures.

Try investigating ways that decompositions might give one new insights into ideas you find intriguing.

References

Those who can access JSTOR can find some of the papers mentioned above there. For those with access, the American Mathematical Society’s MathSciNet can be used to get additional bibliographic information and reviews of some of these materials. Some of the items above can be found via the ACM Digital Library, which also provides bibliographic services.

Andrews, G.E., 1998. The theory of partitions (No. 2). Cambridge University Press.

Andrews, G.E. and Eriksson, K., 2004. Integer partitions. Cambridge University Press.

Atallah, Mikhail J., and Marina Blanton, eds. Algorithms and theory of computation handbook, volume 2: special topics and techniques. CRC Press, 2009.

Bóna, M. ed., 2015. Handbook of enumerative combinatorics (Vol. 87). CRC Press.

Fulton, Mr William, and William Fulton. Young tableaux: with applications to representation theory and geometry. No. 35. Cambridge University Press, 1997.

Graham, Ronald L. Handbook of combinatorics. Elsevier, 1995.

Gupta H. Partitions – a survey. Journal of Res. of Nat. Bur. Standards-B Math. Sciences B. 1970 Jan 1;74:1-29.

Lovász, László, József Pelikán, and Katalin Vesztergombi. Discrete mathematics: elementary and beyond. Springer Science & Business Media, 2003.

Martin, George E. Counting: The art of enumerative combinatorics. Springer Science & Business Media, 2001.

Matousek, Jiri. Lectures on discrete geometry. Vol. 212. Springer Science & Business Media, 2013.

Menezes, Alfred J., Paul C. Van Oorschot, and Scott A. Vanstone. Handbook of applied cryptography. CRC Press, 2018.

Pak, Igor. "Partition bijections, a survey." The Ramanujan Journal 12.1 (2006): 5-75.

Rosen, Kenneth H., ed. Handbook of discrete and combinatorial mathematics. CRC Press, 2017.

Rosen, Kenneth H., and Kamala Krithivasan. Discrete mathematics and its applications: with combinatorics and graph theory. Tata McGraw-Hill Education, 2012.

Sjöstrand, Jonas. Enumerative combinatorics related to partition shapes. Diss. KTH, 2007.

Stanley, Richard P. Ordered structures and partitions. Vol. 119. American Mathematical Soc., 1972.

Stanley, Richard P. "What is enumerative combinatorics?." Enumerative combinatorics. Springer, Boston, MA, 1986. 1-63.

Stanley, Richard P. "Enumerative Combinatorics Volume 1 second edition." Cambridge studies in advanced mathematics (2011).

Stanley, Richard P., and S. Fomin. "Enumerative combinatorics. Vol. 2, volume 62 of." Cambridge Studies in Advanced Mathematics (1999).

Stanley, Richard P. Catalan numbers. Cambridge University Press, 2015.

Toth, Csaba D., Joseph O’Rourke, and Jacob E. Goodman, eds. Handbook of discrete and computational geometry. CRC Press, 2017.

Meet me up in space!

Rather than closing the distance, however, the target seemed to move down and away in defiance of everyday intuition…

David Austin
Grand Valley State University

Complex space missions rely on the ability to bring two spacecraft together, a procedure called orbital rendezvous. A spacecraft docking at the International Space Station is a typical example.

Historically, rendezvous was a vital component of the Apollo lunar missions. While three astronauts traveled to the moon in a single craft, the command module, two of them descended to the lunar surface in a second craft, the lunar excursion module (LEM). A successful mission required the ascending LEM to rendezvous with the command module before returning to Earth.

Gemini 4 Capsule at the Smithsonian National Air and Space Museum

Gemini 4 Capsule at the Smithsonian National Air and Space Museum.

The first attempt at orbital rendezvous was one component of the Gemini IV mission in 1965 when the pilot James McDivitt tried to bring his Gemini capsule close to the spent booster that lifted them into orbit. Upon first seeing the booster at an estimated 120 meters, McDivitt aimed the capsule at the booster and thrusted toward it. Rather than closing the distance, however, the target seemed to move down and away in defiance of everyday intuition. He repeated this procedure several times with similar results.

NASA engineer André Meyer later recalled, “There is a good explanation [for] what went wrong with rendezvous.” Mission planners “just didn’t understand or reason out the orbital mechanics involved” due to competing mission priorities. That’s what we’re going to do here. This column will explain the relative motion of two spacecraft flying closely in Earth orbit and why that motion behaves in such a counter-intuitive way.

Science fiction films often depict spacecrafts that continually alter their trajectories by firing their engines. In the real world, however, fuel is a scarce resource due to the energy required to lift the fuel into space. As a result, spacecraft spend most of their time coasting and only apply short bursts of thrust at opportune moments to adjust their trajectory. This means that a spacecraft operating near, say, Earth or the Moon will spend most of its time traveling in an elliptical orbit, as dictated by Kepler’s second law.

We therefore imagine a target spacecraft $T$ traveling in a circular orbit, and an interceptor spacecraft $I$ attempting to rendezvous with the target and traveling in another elliptical orbit. It’s not essential that the target’s orbit be circular, but it’s a convenient assumption for our purposes. The object of orbital rendezvous is to guide the interceptor so that it comes together with the target.

Interceptor and target into Earth orbit

Equations of relative motion

The interceptor perceives the target as a fixed point that it wants to move toward. For this reason, we are interested in describing the motion of the interceptor in a coordinate system that rotates with the target, which means we want to understand the evolution of the vector joining the spacecrafts.

As a note to the reader, we’ll walk through some calculations that explain the evolution of this vector and eventually find a rather elegant description. I’d like to give some intuition for what controls the motion, but I won’t give every detail and trust that the reader can fill in as much or as little as they wish. This exposition follows a 1962 technical report by NASA engineer Donald Mueller and a more recent survey by Bradley Carroll. Apollo 11 astronaut Edwin “Buzz” Aldrin’s M.I.T. Ph.D. thesis covers similar ground.

We’ll denote the interceptor’s location by $I$ and the target’s by $T$. The vector ${\mathbf r}$ measures the displacement of the interceptor from the target. The vectors ${\mathbf R}^*$ and ${\mathbf r}^*$ measure the displacement of the target and interceptor from Earth’s center.

The displacement vector between the inteceptor and target

Since our goal is to understand the motion of the interceptor $I$ relative to the target $T$, we will focus our attention on ${\mathbf r}$. Moreover, we will consider a coordinate system that rotates with the target $T$ as shown below.

Fixed and rotating coordinate frames

For convenience, we will assume that the target is moving in a circular orbit of radius $R_0$ in the $xy$-plane. We see that ${\mathbf i}$ is a unit vector pointing in the direction of the target’s motion, ${\mathbf j}$ points radially from the center of Earth, and ${\mathbf k}$ is a unit vector pointing out of the page.

The vectors ${\mathbf i}^*$, ${\mathbf j}^*$, and ${\mathbf k}^* = {\mathbf k}$ are fixed. In general, quantities denoted with a $*$ are with respect to this fixed, Earth-centered coordinate system and quantities without a $*$ are with respect to the rotating, target-centered coordinate system.

We can describe $$ {\mathbf R^*} = R_0\left[\cos(\omega_0T){\mathbf i}^* -\sin(\omega_0t){\mathbf j}^*\right], $$ where $\omega_0$ is the angular velocity of the target. Differentiating twice gives the acceleration $$ \frac{d^2}{dt^2}{\mathbf R}^* = R_0\omega_0^2\left[-\cos(\omega_0T){\mathbf i}^* +\sin(\omega_0t){\mathbf j}^*\right]. $$

Since gravity is the only force acting on the target, the magnitude of the acceleration vector is $GM/R_0^2$, where $G$ is the gravitational constant and $M$ is Earth’s mass. This tells us that $R_0\omega_0^2 = GM/R_0^2$ so we have $$ \omega_0=\sqrt{\frac{GM}{R_0^3}}. $$ Notice that the angular velocity decreases as the target’s altitude increases.

This is really an expression of Kepler’s Third Law for circular orbits. If $P$ is the period of the orbit, then $\omega_0P = 2\pi$, which says that $$ P^2 = \frac{4\pi^2}{\omega_0^2} = \frac{4\pi^2}{GM}R_0^3 $$ so that the square of the period is proportional to the cube of the radius.

We would like to relate rates of change measured in the fixed, Earth-centered coordinate system to those measured in the rotating, target-centered system. In particular, we are interested in the derivatives $$\frac{d^*}{dt}{\mathbf r},\hspace{24pt} \frac{d}{dt}{\mathbf r}, $$ the first of which measures the rate of change of ${\mathbf r}$ in the fixed coordinate system while the second measures the rate of change in the rotating coordinate system.

Writing $$ {\mathbf r} = x{\mathbf i} + y{\mathbf j} + z{\mathbf k} $$ shows us that $$ \frac{d^*}{dt}{\mathbf r} = \left(\frac{d}{dt}x\right){\mathbf i} + \left(\frac{d}{dt}y\right){\mathbf j} + \left(\frac{d}{dt}z\right){\mathbf k} + x\frac{d}{dt}{\mathbf i} + y\frac{d}{dt}{\mathbf j}. $$

Notice that there are some additional terms that come from differentiating the coordinate vectors. Introducing the angular velocity vector ${\mathbf \omega} = -\omega_0{\mathbf k}$ allows us to see, with a little calculation, that $$ \frac{d^*}{dt}{\mathbf r} = \frac{d}{dt}{\mathbf r} + \omega\times{\mathbf r}. $$

After differentiating again, one can show that $$ \frac{d^2}{dt^2}{\mathbf r} = \frac{{d^*}^2}{dt^2}{\mathbf r} – \omega\times(\omega\times{\mathbf r}) – 2\omega\times\frac{d}{dt}{\mathbf r}. $$

Remember that ${\mathbf r} = {\mathbf r}^* – {\mathbf R}^*$ so that $$ \frac{{d^*}^2}{dt^2}{\mathbf r} = \frac{{d^*}^2}{dt^2}{\mathbf r}^* – \frac{{d^*}^2}{dt^2}{\mathbf R}^* = -\frac{GM}{|{\mathbf r}^*|^3}{\mathbf r}^* +\frac{GM}{|{\mathbf R}^*|^3}{\mathbf R}^*. $$ Since the target is moving in a circular orbit of radius $R_0$, we see that ${\mathbf R}^* = R_0{\mathbf j}$. Also, because ${\mathbf r}^* = {\mathbf R}^* + {\mathbf r}$, we have $$ {\mathbf r}^* = x{\mathbf i} + (R_0+y){\mathbf j} + z{\mathbf k}. $$

Since we are interested in guiding the interceptor to rendezvous with the target, we will assume that the distance between the target and interceptor is much less than the radius of the target’s orbit. For instance, ${\mathbf r} = x{\mathbf i} + y{\mathbf j} + z{\mathbf k}$ may represent a length of a few kilometers, while ${\mathbf R}^* = R_0{\mathbf j}$ represents a length of several thousands of kilometers.

This leads us to approximate $|{\mathbf r}^*| \approx R_0 + y$ and to apply the linear approximation $$ \frac{1}{|{\mathbf r}^*|^3} \approx \frac{1}{R_0^3} -\frac{3y}{R_0^4}. $$ With a little algebra, we can write the components of the vector $\frac{d^2}{dt^2}{\mathbf r}$ as $$ \begin{aligned} \frac{d^2x}{dt^2} & = -2\omega_0 \frac{dy}{dt} \\ \frac{d^2y}{dt^2} & = 3\omega_0^2y + 2\omega_0\frac{dx}{dt} \\ \frac{d^2z}{dt^2} & = -\omega_0^2z \end{aligned} $$ These are known as the Hill equations that describe the interceptor’s path in the rotating coordinate frame, and we will see how to find explicit solutions to them.

Ellipses everywhere

The equation for the $z$-coordinate is a simple harmonic oscillator whose solution is $$z(t) = A\cos(\omega_0t) + B\sin(\omega_0t). $$

Remember that the $z$-axis is fixed in both coordinate systems. This form for $z(t)$ simply expresses the fact that in the fixed, Earth-centric coordinate system, the interceptor is traveling in an elliptical orbit about Earth’s center. More specifically, this expression describes the motion of the interceptor out of the plane of the target’s orbit. To achieve orbital rendezvous, we need to align the plane of the interceptor with that of the target. In practice, this incurs a considerable cost in fuel so care is taken to launch the interceptor in an orbital plane that’s close to that of the target.

We will assume that the interceptor has performed such a maneuver so that its orbital plane agrees with that of the target. In other words, we’ll assume that $z(t) = 0$ constantly and focus on the other two equations: $$ \begin{aligned} \frac{d^2x}{dt^2} & = -2\omega_0 \frac{dy}{dt} \\ \frac{d^2y}{dt^2} & = 3\omega_0^2y + 2\omega_0\frac{dx}{dt} \\ \end{aligned} $$

We can develop some intuition by thinking about these equations more carefully. Remember that gravity is the only force acting on the interceptor. These equations tell us, however, that in the rotating coordinate system, the interceptor behaves as if it were subjected to two forces per unit mass: $$ \begin{aligned} {\mathbf F}_{\mbox{Tid}} & = 3\omega_0^2y{\mathbf j} \\ {\mathbf F}_{\mbox{Cor}} & = -2\omega_0\frac{dy}{dt}{\mathbf i}+2\omega_0\frac{dy}{dt}{\mathbf j} = -2\omega\times\frac{d}{dt}{\mathbf r}. \\ \end{aligned} $$

The first ${\mathbf F}_{\mbox{Tid}}$, known as the tidal force, tends to push the interceptor away from the target’s circular orbit and makes $y=0$ an unstable equilibrium.

The tidal force The Coriolis force
The tidal force The Coriolis force

The second force, the Coriolis force, causes the interceptor to rotate counterclockwise as it moves in the rotating coordinate system.

Imagine what happens if we start just a little above the $x$-axis. The tidal force pushes us upward further away from the axis and causes us to pick up speed. The Coriolis force then begins to pull us to the left.

One simple solution to the equations of motion in the rotating frame, which can be verified directly from the equations, is given by $$ \begin{aligned} x(t) & = a\cos(\omega_0 t) \\ y(t) & = \frac a2\sin(\omega_0 t). \end{aligned} $$ Of course, we know by Kepler’s Second Law that the interceptor’s trajectory is elliptical in the fixed, Earth-centered frame. What’s remarkable about this solution is that it says that the interceptor’s path in the rotating frame is also elliptical.

The following figures illustrate this solution in both frames along with the positions at equal time intervals. It’s worth taking a moment to study the relationship between them.

Elliptical path in rotating frame In the fixed frame
In the rotating frame In the fixed frame

For this special solution, one can check that $$ \frac{d^2}{dt^2}{\mathbf r} = -\omega_0^2{\mathbf r}. $$ This is the equation for a simple harmonic oscillator, the kind of motion observed by a mass on a spring. In other words, the interceptor moves as if it were connected to the target by a spring, which gives some insight into why orbital rendezvous is so counter-intuitive.

While this solution is certainly special, it shares many features with the general solution. Denoting the initial position by $(x_0, y_0)$ and the initial velocity by $(\dot{x}_0, \dot{y}_0)$, one can check that the general solution is $$ \begin{aligned} x(t) & = \left(6y_0 + \frac{4\dot{x}_0}{\omega_0}\right) \sin(\omega_0 t) + \frac{2\dot{y}_0}{\omega_0}\cos(\omega_0t) + \left[x_0 – \frac{2\dot{y}_0}{\omega_0} – (3\dot{x}_0 + 6\omega_0y_0)t\right] \\ y(t) & = -\left(3y_0 + \frac{2\dot{x}_0}{\omega_0}\right)\cos(\omega_0 t) + \frac{\dot{y}_0}{\omega_0}\sin(\omega_0t) + \left[4y_0 + \frac{2\dot{x}_0}{\omega_0}\right] \end{aligned} $$

These expressions, known as the Clohessy-Wiltshire equations, initially look complicated, but a little study reveals a suprising simplicity. Defining $$ \begin{aligned} C & = 3y_0 + \frac{2\dot{x}_0}{\omega_0} \\ D & = \frac{\dot{y}_0}{\omega_0} \\ x_c & = x_0 – \frac{2\dot{y}_0}{\omega_0} – (3\dot{x}_0 + 6\omega_0y_0)t \\ y_c & = 4y_0 + \frac{2\dot{x}_0}{\omega_0}, \end{aligned} $$ enables us to write $$ \begin{aligned} x(t) = 2C\sin(\omega_0 t) + 2D\cos(\omega_0 t) + x_c \\ y(t) = -C\cos(\omega_0 t) + D\sin(\omega_0 t) + y_c. \end{aligned} $$

This shows that the interceptor, once again, is traveling in an elliptical path centered about the point $(x_c, y_c)$. In fact, the semi-major axis $a$ and semi-minor axis $b$ of the ellipse are $$ \begin{aligned} a & = 2\sqrt{C^2+D^2} \\ b & = a/2, \end{aligned} $$ which shows that the eccentricity of the path is always $\sqrt{3}/2$.

Notice, however, that the point $(x_c, y_c)$ is moving since $x_c$ has velocity $$ v_{\mbox{drift}} = – (3\dot{x}_0 + 6\omega_0y_0)t = -\frac32\omega_0 y_c. $$ This is important because it says that the center of the ellipse drifts left if $y_c \gt 0$ and drifts right if $y_c \lt 0$.

Parking orbits

Here are some special orbits that Mueller calls parking orbits.

Type I parking orbit Remember that the orbit is determined by its initial conditions $(x_0,y_0)$ and $(\dot{x}_0,\dot{y}_0)$. With the right choice of initial conditions, we can arrange that $C=D=0$ so that the ellipse collapses to a point. Mueller’s type I parking orbit has the interceptor in a circular orbit, as seen in the fixed frame, at the same altitude as the target. In the rotating frame, the interceptor is not moving relative to the target.

Type III parking orbit A type III parking orbit has the interceptor again in a circular orbit but at a different altitude than the target. Since the angular velocity decreases with the altitude, the interceptor appears to move horizontally in the rotating frame. The Clohessy-Wiltshire equations explain this by noting that the center of the ellipse moves left when $y_c\gt0$ and right when $y_c\lt0$.

Type II parking orbit Things get a little more interesting if we allow $C$ and $D$ to be nonzero so that the interceptor moves on an elliptical trajectory in the rotating frame. A type II parking orbit has $y_c = 0$ so that the center of the ellipse is centered on the $x$-axis. In this case, the center does not move so the elliptical orbit is stationary. The special solution we looked at earlier is an example. The orbit depicted here shows how an astronaut could retrieve a tool accidentally dropped by just waiting one orbital period for its return.

Type IV parking orbit A type IV orbit has the interceptor traveling an elliptical orbit whose center is drifting. The orbit shown has $y_c \gt 0$ so the center of the ellipse is moving to the left.

With any solution, one can check that the interceptor moves as if connected by a spring to the moving center of the ellipse, which again speaks to the counter-intuitive nature of the motion.

These orbits explain the behavior of Gemini IV, which we described in the introduction. Imagine that the interceptor is initially in a type I parking orbit, traveling in the same circular orbit as the target and 500 meters away.

Gemini IV in a type I parking orbit

McDivitt aimed at the target and fired Gemini’s thruster, which would make $\dot{x}_0 \gt 0$ and move the craft into the type IV orbit shown here. This explains why the target paradoxically appeared to move down and away from the capsule.

Gemini IV after thrusting forward

One scenario Carroll considers has an astronaut stranded outside the ISS at an initial position of $(100,100)$, which is 100 meters in front of the space station and 100 meters above. Applying a momentary thrust to create a velocity of 1 meter per second aimed directly at the space station leads to a type IV orbit that badly misses the station.

Trajectory of a stranded astronaut

So how can we put the interceptor on a path to dock with the target? We’ll assume the interceptor is at an initial position $(x_0, y_0)$ and choose a time $T$ at which we’d like to dock. Setting $x(T) = 0$ and $y(T) = 0$ gives the equations $$ \begin{aligned} x(T) = 0 & = \left(6y_0 + \frac{4\dot{x}_0}{\omega_0}\right) \sin(\omega_0 T) + \frac{2\dot{y}_0}{\omega_0}\cos(\omega_0T) + \left[x_0 – \frac{2\dot{y}_0}{\omega_0} – (3\dot{x}_0 + 6\omega_0y_0)T\right] \\ y(T) = 0 & = -\left(3y_0 + \frac{2\dot{x}_0}{\omega_0}\right)\cos(\omega_0 T) + \frac{\dot{y}_0}{\omega_0}\sin(\omega_0T) + \left[4y_0 + \frac{2\dot{x}_0}{\omega_0}\right] \end{aligned} $$

Once again, these equations appear complicated, but notice that we know everything except the components of the initial velocity. This means that these are two linear equations for the two components of the unknown initial velocity $(\dot{x}_0, \dot{y}_0)$. Once we have found those components, the interceptor applies a momentary thrust to change its velocity to $(\dot{x}_0, \dot{y}_0)$ bringing it the target’s position at time $T$.

Carroll’s paper shows how to apply this idea to reproduce the rendezvous technique used in the Apollo 11 mission as Neil Armstrong and Buzz Aldrin returned from the lunar surface to dock with Michael Collins in the command module.

Summary

NASA learned from Gemini IV’s failure to rendezvous with the booster target, and several months later, the Gemini VI mission successfully navigated to within 130 feet of Gemini VII. The pilot of Gemini VI, Walter Schirra, later explained that the difficulty was getting to that distance but that completing docking from there is relatively straightforward. Carroll’s paper examines that claim and confirms that line-of-sight flying is effective within about 100 feet of the target.

The International Space Station

The International Space Station in 2021.

In the past, spacecraft docking with the International Space Station would fly in formation with the station, which would grab the craft with the Canadarm2 and manually complete the docking procedure. Recently, however, SpaceX has automated the docking process so that software guides the craft rather than astronauts who are not involved at all.

Would you like to try to dock with the ISS? This SpaceX simulator gives you the chance.

References

Principal Component Analysis:

Three Examples and some Theory

Very often, especially in applications to the life sciences, useful low-dimensional projections exist and allow humans to grasp a data set that would otherwise be inscrutable.

Tony Phillips
Stony Brook University

Introduction

Principal component analysis (PCA), an algorithm for helping us understand large-dimensional data sets, has become very useful in science (for example, a search in Nature for the year 2020 picks it up in 124 different articles). Human minds are good at recognizing patterns in two dimensions and to some extent in three, but are essentially useless in higher dimensions. The goal of PCA is to produce the most useful possible 2 or 3-dimensional projection of a high-dimensional data set—most useful in that the smallest amount of information is lost by the projection. One can construct artificial examples where all the dimensions must be considered, but very often, especially in applications to the life sciences, useful low-dimensional projections exist and allow humans to grasp a data set that would otherwise be inscrutable. In this column we will look at PCA in three recently published applications (to anthropology and paleontology), and go through the mathematics that makes it work.

Example 1. Distinguishing nearby soil samples in a 78,000-year-old burial

burial PCA
In this PCA, 13-dimensional data from some 80 soil samples are projected into the plane spanned by their two principal components. The projection shows a clear distinction (highlighted by the superimposed 95% confidence ellipses) between samples from the burial pit (red dots) and samples (purple dots) from outside the pit at the same level (Layer 19) of the excavation. Image adapted from Nature, 593, 95-100, Extended Data Figure 4

Earliest known human burial in Africa was published in Nature on May 5, 2021. The authors are an international team of 36 led by María Martinón-Torres, Nicole Boivin and Michael Petraglia. In documenting the 78,000-year-old grave of a child in Panga ya Saidi, Kenya, they needed to establish that there had actually been a burial; they did this in part by showing that the earth surrounding the skeleton was different from the surrounding earth at that level (layer 19) in the archaeological excavation, i.e. that a pit had been dug to receive the child’s corpse. “The burial fill is a mix of ferruginous silt and sand, compositionally similar to the top of layer 18 and the base of layer 17, and different from layer 19, in which the pit was excavated.” They confirmed the difference by measuring the concentrations of 13 chemical elements occurring in the burial fill and in those three layers. To visuaize their ensemble of measurements (80 data points, each with 13 components), “[w]e performed PCA of EDXRF concentrations for the thirteen major, minor, and trace elements more frequently detected (Si, K, Ca, Ti, Mn, Fe, Zn, Ga, As, Rb, Y, Zr and Ba).” As they report, “The burial pit samples markedly differ from layer 19.”

The basic mathematics

In the “Earliest … burial” example, PCA gives a useful 2-dimensional representation of a cloud of points in a 13-dimensional space. (Astronomers describe the general procedure as “finding the principal axes of inertia of the cloud of samples about their mean in feature space.”) The a priori unlikely combination of basic elements of statistics, multivariable calculus and linear algebra that makes it work will be outlined in this section.

To be specific, suppose the data set consists of $n$ measurements taken on $M$ samples; geometrically it would be a set of $M$ points distributed in $n$-dimensional space, where each sample is a point and its $n$ measurements are its $n$ coordinates.

  • To fix notation, let measurements $x_1, \dots, x_n$ be taken on $M$ samples $s_1, \dots, s_m$.

  • For each $x_i$ we calculate the average, or mean

    $$\overline{x_i} = \frac{1}{M}\sum_{j=1}^M x_i(s_j).$$

  • Then for each $x_i$ we calculate the variance: this is the average of the square of the difference between the $x_i$-measurements and the mean: $${\rm Var}(x_i) = \frac{1}{M}\sum_{k=1}^M (x_i(s_k)-\overline{x_i})^2.$$

  • The variance ${\rm Var}(x_i)$ estimates the spread of $x_i$. Squaring the differences puts deviations to the right or the left of the mean on the same footing; it also gives more weight to the larger differences (the “outliers”). (The standard deviation, a more common measure of spread, is defined as the square root of the variance.)

  • For each pair $(x_i, x_j)$ of measurements we calculate the covariance $${\rm Cov}(x_i, x_j) = \frac{1}{M}\sum_{k=1}^M (x_i(s_k)-\overline{x_i})(x_j(s_k)-\overline{x_i}).$$ Note that ${\rm Cov}(x_i, x_i)={\rm Var}(x_i)$.

The various covariances are naturally displayed in  the covariance matrix

$${\bf Cov} ={\bf Cov}(x_1,\dots, x_n)=\left (\begin{array}{cccc}{\rm Cov}(x_1,x_1)& {\rm Cov}(x_1,x_2)&\dots&{\rm Cov}(x_1,x_n)\\ {\rm Cov}(x_2,x_1)&{\rm Cov}(x_2,x_2) &\dots &{\rm Cov}(x_2,x_n)\\ \vdots&\vdots&\vdots&\vdots\\ {\rm Cov}(x_n,x_1)&{\rm Cov}(x_n,x_2)&\dots&{\rm Cov}(x_n,x_n) \end{array} \right ).$$

Note that since ${\rm Cov}(x_i, x_j) = {\rm Cov}(x_j, x_i)$ the matrix ${\bf Cov}$ is symmetric. This fact will be very important when we study its algebraic properties.

So much for the statistics. The next calculations will be simpler to write if we assume our initial measurements have been centered (by subtracting from each one its original mean), so that each one has mean equal to zero. So now ${\rm Cov}(x_i, x_j) = \frac{1}{M}\sum_{k=1}^M x_i(s_k)x_j(s_k), $ for all $i,j=1,\dots,n$.

PCA starts with finding the direction in $n$-space (measurement-space) along which the data are maximally spread out, so that the clustering and spacing of the data points along that direction carry the most information. A direction is given by a unit vector ${\bf a}$ with components $a_1, \dots, a_n$, and the measurement in this direction is the linear combination $y=a_1 x_1 + \dots + a_n x_n$. Since the $x_i$ all have mean zero, the same must hold for $y$, so it is automatically centered. “Maximally spread out” in our context means having the greatest variance ${\rm Var}(y) = \frac{1}{M}\sum_{j=1}^M y(s_j)^2 $.

So we want to find the components $a_1, \dots, a_n$ leading to the maximum of ${\rm Var}(y) = {\rm Var}(a_1 x_1 + \dots + a_n x_n)$, subject to the constraint $ a_1 ^2 +\cdots +a_n^2=1$. Geometrically, we are maximizing ${\rm Var}(y)$ on the surface of the unit sphere in $n$-space. The way we learn to do this in multivariable calculus is to look for the points on the sphere $a_1 ^2 +\cdots +a_n^2=1$ where the gradient of ${\rm Var}(y)$ is perpendicular to the surface of the sphere or, equivalently, parallel to the gradient of the function $a_1 ^2 +\cdots +a_n^2$, which is $(2a_1, \dots, 2a_n)$. One of these points will be the maximum. (The  gradient of a function $f$ of $n$ variables $x_1, \dots, x_n$ is the vector $(\partial f/\partial x_1, \dots, \partial f/\partial x_n)$. It points in the direction of steepest increase.)

Numerically, the statement that the gradients of ${\rm Var}(y)$ and $a_i ^2 +\cdots +a_n^2$ are parallel means that there is a non-zero scalar $\lambda$ such that one is $\lambda$ times the other (this is the basis of the method of  Lagrange multipliers):

$$ \frac{\partial {\rm Var}(y)}{\partial a_1}= 2\lambda a_1$$

$$~~~~~~~~~~~~~~~~\vdots ~~~~~~~~~~~~~~~~(*)$$

$$ \frac{\partial {\rm Var}(y)}{\partial a_n}   =  2\lambda a_n. $$

Let us calculate $\partial {\rm Var}(y)/\partial a_j$.

$$\frac{\partial}{\partial a_j}{\rm Var}(y) = \frac{\partial}{\partial a_j}{\rm Var}(a_1 x_1 + \dots + a_n x_n).$$

Remembering that the $x_i$ are centered,

$${\rm Var}(a_1 x_1 + \dots + a_n x_n)= \frac{1}{M}\sum_{k=1}^M (a_1 x_1(s_k) + \dots + a_n x_n(s_k))^2$$

so by the chain rule

$$\frac{\partial}{\partial a_j}{\rm Var}(y)= \frac{1}{M} \sum_{k=1}^M [2(a_1 x_1(s_k) + \dots + a_n x_n(s_k))\cdot x_j(s_k)]$$

$$= 2a_1{\rm Cov}(x_1,x_j) + \cdots + 2a_n {\rm Cov}(x_n,x_j).$$

Equations $(*)$ then imply that for each $j$ between 1 and $n$,

$$2a_1 {\rm Cov}(x_1,x_j) + \cdots + 2a_n {\rm Cov}(x_n,x_j) = 2\lambda a_j.$$

Here is where linear algebra makes a miraculous intervention. Cancelling the 2s, and remembering that  ${\rm Cov}(x_i,x_j) = {\rm Cov}(x_j,x_i)$, these equations together can be rewritten as

$$ \left (
\begin{array}{cccc}
{\bf Cov}(x_1,x_1)& {\bf Cov}(x_1,x_2)&\dots&{\bf Cov}(x_1,x_n)\\
{\bf Cov}(x_2,x_1)&{\bf Cov}(x_2,x_2) &\dots &{\bf Cov}(x_2,x_n)\\
\vdots&\vdots&\vdots&\vdots\\
{\bf Cov}(x_n,x_1)&{\bf Cov}(x_n,x_2)&\dots&{\bf Cov}(x_n,x_n)
\end{array}
\right ) \left(
\begin{array}{c}a_1\\a_2\\\vdots\\a_n
\end{array} \right )= \lambda \left(
\begin{array}{c}a_1\\a_2\\\vdots\\a_n
\end{array} \right ).~(**)$$

We say that the column vector ${\bf a}$  is an eigenvector for the covariance matrix ${\bf Cov}$, with eigenvalue $\lambda$.

Finding the eigenvalues and eigenvectors for the $n\times n$ matrix ${\bf Cov}$ is a standard set of calculations in linear algebra, which I will summarize here.

  • The eigenvalues of ${\bf Cov}$ are defined as the roots of the degree-$n$ polynomial $\det({\bf Cov}-\lambda {\rm I})=0$, where ${\rm I}$ is the $n\times n$ identity matrix; but for large $n$ this definition is not practical: eigenvectors and eigenvalues are calculated by the Lagrange multiplier method as above. In applications to the life sciences, where there are not likely to be hidden symmetries in the problem, the eigenvalues will be all distinct; furthermore, since ${\bf Cov}$ is a symmetric matrix these are all real numbers, and can therefore be ordered $\lambda_1 > \lambda_2 > \cdots > \lambda_n$.

  • For each index $j$, the vector equation ${\bf Cov} \cdot {\bf a} = \lambda_j {\bf a}$ has a non-zero solution, unique up to a scalar multiple, so unique up to sign if we fix its length to be 1. This solution, call it ${\bf a}_j$, is the eigenvector corresponding to $\lambda_j$.

  • (For future reference, here is a useful fact: because the matrix is symmetric, eigenvectors corresponding to different eigenvalues are perpendicular.)

In particular the column vector ${\bf a}_1$, with components $ a_{11}, \dots, a_{1n}$, corresponding to the maximum eigenvalue $\lambda_1$, gives us the answer to our original question: the direction in measurement-space along which the data are maximally spread out. Notice that the indeterminacy in the sign of ${\bf a}_1$ does not affect its direction. In PCA, the quantity $y_1 = a_{11}x_1 + \dots + a_{1n}x_n$ is called PC1, the first principal component.

What is this maximum variance? The answer involves some calculation.

Remember that for $\displaystyle{y_1 = a_{11}x_1 +\dots + a_{1n}x_n}$,
$${\rm Var}(y_1) ={\rm Var}(a_{11} x_1 + \dots + a_{1n} x_n)= \frac{1}{M}\sum_{k=1}^M (a_{11} x_1(s_k) + \dots + a_{1n} x_n(s_k))^2$$
so
$${\rm Var}(y_1) =\frac{1}{M}\sum_{k=1}^M \sum_{i=1}^n\sum_{j=1}^n a_{1i}a_{1j}x_i(s_k)x_j(s_k)$$

$$= \sum_{i=1}^n\sum_{j=1}^n a_{1i}a_{1j}\frac{1}{M}\sum_{k=1}^M x_i(s_k)x_j(s_k) = \sum_{i=1}^n\sum_{j=1}^n a_{1i}a_{1j} {\rm Cov}(x_i,x_j)$$

$$ = (a_{11} ~ a_{12}~ \cdots ~a_{1n}) \left (
\begin{array}{cccc}
{\rm Cov}(x_1,x_1)& {\rm Cov}(x_1,x_2)&\dots&{\rm Cov}(x_1,x_n)\\
{\rm Cov}(x_2,x_1)&{\rm Cov}(x_2,x_2) &\dots &{\rm Cov}(x_2,x_n)\\
\vdots&\vdots&\vdots&\vdots\\
{\rm Cov}(x_n,x_1)&{\rm Cov}(x_n,x_2)&\dots&{\rm Cov}(x_n,x_n)
\end{array}
\right ) \left(
\begin{array}{c}a_{11}\\a_{12}\\\vdots\\a_{1n}
\end{array} \right ) $$
$$=(a_{11} ~ a_{12}~ \cdots ~a_{1n})~ \lambda_1 \left(
\begin{array}{c}a_{11}\\a_{12}\\\vdots\\a_{1n}
\end{array} \right ) = \lambda_1 (a_{11}^2 + a_{12}^2 + \cdots a_{1n}^2) = \lambda_1.$$

Amazingly enough, the variance of the linear combination $y_1 = a_{11}x_1 + \dots + a_{1n}x_n$ using the coefficients of ${\bf a}_1$ is exactly the corresponding eigenvalue $\lambda_1$.

Once we have found ${\bf a}_1$, we go on to look for the next most useful direction in which to sort the data. We look in the hyperplane (in measurement-space) perpendicular to ${\bf a}_1$ and repeat the analysis. As before we want to maximize ${\rm Var}(y) = {\rm Var}(b_1 x_1 + \dots + b_n x_n)$; subject to the constraint $b_1 ^2 +\cdots +b_n^2=1$; now we add the additional constraint $b_1a_{11} + \cdots + b_na_{1n} = 0$. The solution will be a point satisfying $b_1 ^2 +\cdots +b_n^2=1$ where the gradient of ${\rm Var}(y)$ is parallel to the gradient of $ b_i ^2 +\cdots +b_n^2$ and lies in the plane $b_1a_{11} + \cdots + b_na_{1n} = 0$. As before, the solution is an eigenvector of the covariance matrix; the condition $b_1a_{11} + \cdots + b_na_{1n} = 0$ eliminates ${\bf a}_1$, leaving ${\bf a}_2$ as the direction maximizing variance; $y_2 = a_{21}x_1 + \cdots a_{2n}x_n$ will be PC2, the second principal component, with variance calculated as above to be $\lambda_2$.

Some more linear algebra: equation $(**)$ specialized to the vector ${\bf a}_i$ can be written in matrix notation ${\bf Cov}\cdot{\bf a}_i = \lambda_i{\bf a}_i$. Amalgamating the $n$ column vectors ${\bf a}_1, \dots {\bf a}_n$ into a matrix ${\bf A}$ will give $ {\bf Cov}\cdot{\bf A} = {\bf A}\cdot{\bf \Lambda}$, where ${\bf \Lambda}$ is the diagonal matrix with entries $\lambda_1, \dots, \lambda_n$. Now ${\bf A}$, being made up of orthogonal columns of length 1, is an orthogonal matrix; its inverse is easy to calculate: it is the transpose ${\bf A}^{\top}$. Multiplying both sides of the last equation on the left by ${\bf A}^{\top}$ gives ${\bf A}^{\top}\cdot{\bf Cov}\cdot{\bf A} = {\bf A}^{\top}\cdot {\bf A}\cdot {\bf \Lambda}= {\bf \Lambda}$. On the other hand, ${\bf A}^{\top}\cdot {\bf Cov}\cdot{\bf A}$, being similar to ${\bf Cov}$, has the same trace ${\rm Cov}(x_1, x_1) + \cdots + {\rm Cov}(x_n, x_n)$, which then must be equal to the trace of ${\bf \Lambda}$, i.e. $\lambda_1 + \cdots + \lambda_n$. The sum of the eigenvalues is equal to the total variance. Often in applications to the life sciences the first two eigenvalues together account for a large proportion of the variance, and their associated principal components give a satisfactory two-dimensional representation of the entire $n$-dimensional panorama of the measurements (in our first two examples they account for 76% and 95.5%, respectively, of the total variance). Often also the third principal component is calculated to further refine the picture.

To complete the picture the $M$ points in $n$-space corresponding to the original measurements can be mapped into the (CP1, CP2)-plane using the matrix ${\bf A}$: $y_1(s_k) = a_{11}x_1(s_k)+\cdots+a_{1n}x_n(s_k)$, $y_2(s_k) = a_{21}x_1(s_k)+\cdots+a_{2n}x_n(s_k)$, or into (CP1, CP2, CP3)-space by using also $y_3(s_k) = a_{31}x_1(s_k)+\cdots+a_{3n}x_n(s_k)$.

Example 2. Sorting Kuehneotherium teeth

Kuehneotherium
Kuehneotherium (right) and its 200-million- years-ago contemporary Morganucodon are imagined by John Sibbick foraging for insects in what is now part of Wales. The painting, which appeared on the cover of Nature, is reproduced courtesy of Pamela Gill.

This example, which comes from a review article on PCA by Ian T. Jolliffe and Jorge Cadima in  Philosophical Transactions of the Royal Society A, “examines a dataset consisting of nine measurements on 88 fossil teeth from the early mammalian insectivore Kuehneotherium . …  Kuehneotherium is one of the earliest mammals and remains have been found during quarrying of limestone in South Wales, UK. The bones and teeth were washed into fissures in the rock, about 200 million years ago, and all the lower molar teeth used in this analysis are from a single fissure. However, it looked possible that there were teeth from more than one species of  Kuehneotherium in the sample.” The teeth came from the collection of the Natural History Museum London, and the measurements were made by Pamela G. Gill, School of Earth Sciences, Bristol. Her nine measurements are identified in the next figure.

tooth parameters
The nine parameters measured by Pamela Gill on each Kuehneotherium tooth. Image courtesy of Pamela Gil.

Cadima and Jolliffe ran a PCA of a subset Gill’s measurements (all the teeth from one of the fissures), and obtained the plot shown here.

tooth PCA
Cadima and Jolliffe’s PCA of the Kuehneotherium dataset. The original 88 data points have been projected onto the (PC1, PC2) plane, along with vectors corresponding to the nine original measurements. Image courtesy of Jorge Cadima.

They remark: “The first two PCs account for 78.8% and 16.7%, respectively, of the total variation in the dataset, so the two-dimensional scatter-plot of the 88 teeth given by [the figure] is a very good approximation to the original scatter-plot in nine-dimensional space.” Furthermore “All of the loadings in the first PC have the same sign, so it is a weighted average of all variables, representing ‘overall size’. In [the figure], large teeth are on the left and small teeth on the right. The second PC has negative loadings for the three length variables and positive loadings for the other six variables, representing an aspect of the ‘shape’ of teeth. Fossils near the top of [the figure] have smaller lengths, relative to their heights and widths, than those towards the bottom.”

They conclude: “The relatively compact cluster of points in the bottom half of [the figure] is thought to correspond to a species of Kuehneotherium, while the broader group at the top cannot be assigned to Kuehneotherium, but to some related, but as yet unidentified, animal.”

When the measurement space has very high dimension

In applications to genetics or to image analysis (where each pixel is a “measurement”) the number $n$ of measurements is often enormous compared to the number $M$ of samples. Following the CPA algorithm sketched above would mean dealing with an intractably large matrix.

Now this problem is only potentially $n$-dimensional, because all we have to work with are data from $M$ points. One way to conceptualize it is to think of each of the aggregate measurement results $x_i$ as a vector ${\bf x}_i$ with components $x_i(s_1), \dots x_i(s_M)$. Now we have $n$ vectors in an $M$-dimensional space, with $n>M$, so there is a set of (at most) $M$ linearly independent vectors which give all the ${\bf x}_i$ by linear combination. Suppose for simplicity there are exactly $M$, say ${\bf z}_1,\dots, {\bf z}_M$. Then our problem of finding a variance-maximizing linear combination $y=a_1x_1 + \dots +a_nx_n$, i.e. ${\bf y} = a_1{\bf x}_1 + \dots +a_n{\bf x}_n$ reduces to finding the much smaller linear combination ${\bf y} = c_1{\bf z}_1 + \dots +c_M{\bf z}_M$ maximizing variance. But how do we identify ${\bf z}_1,\dots, {\bf z}_M$?

Again, linear algebra has just the tools we need. With the same notation as above, let us compute for every $i,j$ between 1 and $M$ the quantity $ \frac{1}{n}(x_1(s_i)x_1(s_j)+x_2(s_i)x_2(s_j) + \cdots + x_n(s_i)x_n(s_j))$. These numbers fit in a new, $M\times M$, matrix we can call ${\bf Voc}$ to distinguish it from ${\bf Cov}$ defined above. We calculate the eigenvectors ${\bf b}_1, \dots, {\bf b}_M$ and corresponding eigenvalues $\lambda_1, \dots, \lambda_M$ for ${\bf Voc}$ as we did before.

There is a useful relation between ${\bf Cov}$ and ${\bf Voc}$. Suppose we call ${\bf X}$ the matrix of all the measurements of all the samples, so ${\bf X}_{ij} = x_i(s_j)$, and its transpose ${\bf X}^{\top}_{ij} = {\bf X}_{ji} = x_j(s_i)$. The matrix ${\bf X}$ has $M$ rows and $n$ columns, vice-versa for ${\bf X}^{\top}$. The matrix product ${\bf X}\cdot {\bf X}^{\top}$ is defined by $[{\bf X}\cdot {\bf X}^{\top}]_{ij} = {\bf X}_{i1} {\bf X}^{\top}_{1j} + \cdots + {\bf X}_{iM} {\bf X}^{\top}_{Mj} = x_i(s_1)x_j(s_1) + \cdots + x_i(s_M)x_j(s_M)$ $= M {\rm Cov}(x_1, x_j)$ whereas $[{\bf X}^{\top}\cdot {\bf X}]_{ij} = {\bf X}^{\top}_{i1} {\bf X}_{1j} + \cdots + {\bf X}^{\top}_{in} {\bf X}_{nj} = x_1(s_i)x_1(s_j) + \cdots + x_n(s_i)x_n(s_j)$ $ =n {\rm Voc}(s_i,s_j)$ So ${\bf X}\cdot {\bf X}^{\top} = n {\bf Cov}$, $n$ times the intractable matrix we are interested in, and ${\bf X}^{\top}\cdot {\bf X} = M {\bf Voc}$, $M$ times its smaller cousin. Now a little more linear algebra:

  • ${\bf X}^{\top}\cdot {\bf X}$ and ${\bf X}\cdot {\bf X}^{\top}$ have the same non-zero eigenvalues, $\lambda_1, \dots, \lambda_M$
  • For each eigenvector ${\bf b}_i$ of ${\bf X}^{\top}\cdot {\bf X}$, the vector ${\bf X}\cdot {\bf b}_i$ is an eigenvector for ${\bf X}\cdot {\bf X}^{\top}$ with the same eigenvalue $\lambda_i$.

The proofs of these a priori unlikely statements are immediate: If there is a non-zero vector ${\bf b}$ with ${\bf X}^{\top}\cdot {\bf X}\cdot {\bf b}=\lambda{\bf b}$, then ${\bf X}\cdot {\bf X}^{\top}\cdot {\bf X}\cdot {\bf b} =\lambda{\bf X}\cdot {\bf b}$, so ${\bf X}\cdot {\bf b}$ is an eigenvector for ${\bf X}\cdot {\bf X}^{\top}$ with eigenvalue $\lambda$.

Now if ${\bf b}$ is an eigenvector of the matrix ${\bf A}$ with eigenvalue $\lambda$, then ${\bf b}$ will also be an eigenvector of $k$ times ${\bf A}$, with eigenvalue $k\lambda$ (since $k{\bf A}\cdot {\bf b} = k({\bf A}\cdot {\bf b}) = k(\lambda {\bf b})=(k\lambda) {\bf b} $). So once we know the eigenvectors and eigenvalues for the smaller matrix ${\bf Voc}$, we can retrieve the eigenvectors and eigenvalues for ${\bf Cov}$, and complete our PCA.

The switch to the transpose makes examples like the next one possible.

Example 3. Parsing Native American genes in Polynesian ancestry

“Native American gene flow into Polynesia predating Easter Island settlement” was published in Nature, July 8, 2020. The authors are a team of 32 led by Alexander G. Ioannidis (Stanford), Javier Blanco-Portillo and Andrés Moreno-Estrada (both at LANGEBIO, Irapuato, Mexico). They begin: “The possibility of voyaging contact between prehistoric Polynesian and Native American populations has long intrigued researchers.” They mention Thor Heyerdahl’s controversial suggestion “that prehistoric South American populations had an important role in the settlement of east Polynesia and particularly of Easter Island (Rapa Nui),” and that the matter is still “hotly contested.” No longer, presumably: “We find conclusive evidence for prehistoric contact of Polynesian individuals with Native American individuals (around AD 1200)… Our analyses suggest strongly that a single contact event occurred in eastern Polynesia, before the settlement of Rapa Nui, between Polynesian individuals and a Native American group most closely related to the indigenous inhabitants of present-day Colombia.”

Part of their evidence is a detailed analysis of the ancestry of present-day inhabitants of Rapa Nui. They split them into two groups: those who have European ancestry and those who do not. They present a PCA comparing these two groups (along with other Pacific islanders) with a spectrum of present-day Native American inhabitants of Central and South America. It shows that the Rapa Nui inhabitants with some European ancestry group with populations in the South of Chile (Mapuche, Pehuenche, Huiiliche, Chilote) while those with no European ancestry group with the Zenu, a population in Colombia, at the opposite end of South America.

ioannides PCA
Part of Supplemental Figure 14 from {\em Nature} {\bf 583}, 572-577 (2020), used with permission. For legibility I have cut the PC1 axis in two and have suppressed the top of the range of PC2. The color key is shown separately below.
color key
Color key for the PCA by Ioannides et al.

The caption in Nature gives more detail:  “Our new weighted ancestry-specific SVD-completed principal component analysis (PCA) applied to the Native American component from Pacific islanders and reference individuals from the Americas. The Native American ancestries of reference individuals are plotted with open circles. The Native American ancestries of Pacific islanders are represented as solid points. … In addition, labels are plotted for the location of the allele frequency vector created from aggregating all Native American ancestry haplotypes in each island population. The Rapanui are split into two populations, those without European ancestry tracts (dark green) and those with high European and Native American ancestry (blue). The first principal component axis separates the reference Native Americans on a north-south axis, while the second principal component separates the Yamana of southern Patagonia. Pacific islanders’ Native American ancestry is seen to cluster with northern South American references with the exception of the Rapanui with high European ancestry, who cluster with the southern references (Chilean).”
Some background details about this research in a Stanford press release.

Conversations with Folkert Tangerman were very helpful in the preparation of this column.

Updated 31 August in response to a comment from a reader.

The Battle of Numbers

Our topic is the game called rithmomachia or rithmomachy—literally, the battle of numbers…

Ursula Whitcher
AMS | Mathematical Reviews, Ann Arbor, Michigan

This month, we’re going to explore a very old—indeed, medieval—educational game and correct a mathematical error in a sixteenth-century game manual. But before we delve into the past, let me remind you that the Feature Column is seeking new columnists. If you’re interested in sharing your writing about intriguing mathematical ideas, please get in touch!

Pleasant Utility and Useful Pleasantness

Our topic is the game called rithmomachia or rithmomachy—literally, the battle of numbers. The game is played with pieces shaped like geometric figures and labeled with different numbers, on a board like a double chessboard.

photo of a long gameboard set with white and black geometric pieces

A rithmomachia set. Photo by Justin du Coeur.

The twelfth-century scholar Fortolfus described the experience of rithmomachia as the pinnacle of educated leisure:

Indeed, in this art, which you will admire in two ways, is pleasant utility and useful pleasantness. Not only does it not cause tedium, but rather it removes it; it usefully occupies one uselessly idle, and usefully un-occupies the person uselessly busy.

The game’s rules are elaborate. Their importance, and their draw for medieval intellectuals, lies in their connection to the quadrivium. Arithmetic, geometry, astronomy, and music were the four advanced arts in the medieval liberal arts curriculum. All four required an understanding of ratios and sequences. Playing rithmomachia allowed medieval people to practice their math skills and show off their erudition.

Some rithmomachia proponents even claimed the game made you a better person. They often quoted the late Roman philosopher Boethius’ “demonstration of how every inequality proceeds from equality,” which makes grand claims:

Now it remains for us to treat of a certain very profound discipline which pertains with sublime logic to every force of nature and to the very integrity of things. There is a great fruitfulness in this knowledge, if one does not neglect it, because it is goodness itself defined which thus comes to knowable form, imitable by the mind.

Boethius describes a specific procedure for creating different types of sequences and ratios, beginning with the same number:

Let there be put down for us three equal terms, that is three unities, or three twos, or however many you want to put down. Whatever happens in one, happens in the others.

Now these are the three rules: that you make the first number equal to the first, then put down a number equal to the first and the second, finally one equal to the first, twice the second, and the third.

For example, if we begin with $1, 1, 1$ we obtain $1, 2, 4$.

This is the beginning of a geometric sequence where the numbers double at each step: in Boethius’s language, it is a duplex. Applying the same rule to the new list of numbers will create a list with more complicated relationships.

Rithmomachia Pieces

Every rithmomachia piece has its own number (or, in some cases, a stack of numbers):

woodcut showing a game board with numbered geometric pieces

A 1556 illustration of a rithmomachia board from Claude de Boissière’s book Le tres excellent et ancien jeu pythagorique, dict Rythmomachie

The choice of numbers is not arbitrary; they are generated by rules similar to Boethius’ rules for creating inequality from equality. Traditionally, the white gamepieces are considered the “evens” team and the black pieces are considered the “odds” team, though as we will see, this split between even and odd only applies to the circles.

Circles

Each side has eight circle pieces, given by the first four even or odd numbers and their perfect squares. (The odd numbers skip 1, which is a more mystical “unity” in the Boethian scheme.)

Evens

2 4 6 8
4 16 36 64

Odds

3 5 7 9
9 25 49 81

Triangles

The triangles in a rithmomachia set appear in pairs that demonstrate superparticular proportions. These are ratios of the form $n+1:n$, such as $3:2$ or $4:3$. Practically speaking, one can lay out the triangle and circle pieces in a table. The numbers for the first row of triangles are obtained by adding the two circle numbers above that number, in the same column. One may find the numbers for the second row of triangles using ratios. In each column, the ratio of the number in the first triangles row to the number in the last circles row and the ratio of the number in the second triangles row to the number in the first triangles row are the same.

I’ll start with partially completed tables, in case you want to try finding the values yourself:

Evens

Circles I 2 4 6 8
Circles II 4 16 36 64
Triangles I 6 20
Triangles II 9
Ratio

3:2

Odds

Circles I 3 5 7 9
Circles II 9 25 49 81
Triangles I
Triangles II
Ratio

In medieval and Renaissance music, different ratios were used to create different musical scales and analyze the differences between musical notes within those scales. For example, the Pythagorean temperament is based on the ratio $3:2$, which appears when finding the first Team Evens triangle values.

Here are all the triangle values:

Evens

Circles I 2 4 6 8
Circles II 4 16 36 64
Triangles I 6 20 42 72
Triangles II 9 25 49 81
Ratio 3:2 5:4 7:6 9:8

Odds

Circles I 3 5 7 9
Circles II 9 25 49 81
Triangles I 12 30 56 90
Triangles II 16 36 64 100
Ratio 4:3 6:5 8:7 10:9

Squares

The triangular rithmomachia gamepieces used ratios of the form $n+1:n$. The squares use ratios of the form $n+(n-1):n$, which we may simplify to the less evocative form $2n-1:n$. This is a special case of the more general superpartient proportions. A superpartient proportion is any ratio of the form $n+a:n$ where $a$ is an integer greater than 1 and $a$ and $n$ are relatively prime (that is, their greatest common divisor is 1).

The numbers for the square pieces may be found by repeating the method for finding the numbers for triangular pieces, but now shifted two rows down. The numbers for the first row of squares are obtained by adding the two triangle numbers above above that number, in the same column. One may find the numbers for the second row of squares using ratios. In each column, the ratio of the number in the first squares row to the number in the last triangles row and the ratio of the number in the second squares row to the number in the first squares row are the same.

Evens

Circles I 2 4 6 8
Circles II 4 16 36 64
Triangles I 6 20 42 72
Triangles II 9 25 49 81
Squares I 15 45 91 (pyramid) 153
Squares II 25 81 169 289
Ratio 5:3 9:5 13:7 17:9

Odds

Circles I 3 5 7 9
Circles II 9 25 49 81
Triangles I 12 30 56 90
Triangles II 16 36 64 100
Squares I 28 66 120 190 (pyramid)
Squares II 49 121 225 361
Ratio 7:4 11:6 15:8 19:10

Pyramids

The pyramids or kings are sums of perfect squares. Ideally, they should be built out of spare pieces of the appropriate color with these values. The
Even team’s pyramid has the value $1 + 4 + 9 + 16 + 25 + 36 = 91$. The Odd team’s pyramid has the value $16 + 25 + 36 + 49 + 64 = 190$.

Moving Pieces

We have already seen the starting board layout, in the illustration from Claude de Boissière’s manual. Black (Team Odds) always moves first. Each shape of piece follows a different movement rule. The following guidelines are based on the 1563 English rithmomachia manual by Lever and Fulke, which was in turn based on de Boissière’s book in French.

  • The circles move one space diagonally.
  • The triangles move two spaces horizontally or vertically. If not taking a piece, they may also make a chess knight’s move (“flying”).
  • The squares move three spaces horizontally or vertically. If not taking a piece, they may also make a “flying” knight-like move that crosses four squares total. This may be either three vertical squares followed by one horizontal square, or three vertical squares followed by one vertical square.
  • The pyramids may move in the same way as any of the circles, triangles, or squares.

Lever and Fulke give the following diagram illustrating potential moves:

woodcut showing a double checkerboard with letters and rithmomachia pieces

Diagram from Lever and Fulke, 1563

They illustrate the square’s knight-like move by pointing out a square may move from P to Y or T in their diagram.

Capturing Pieces

When a player takes a piece, they change its color to their team’s color (ideally, rithmomachia pieces are two-sided!) The transformed piece moves to the row of the board closest to the player, and may now be used like other pieces. There are many ways to take pieces, using different mathematical properties. Lever and Fulke mention Equality, Obsidion (in some editions, Oblivion), Addition, Subtraction, Multiplication, and Division, as well as an optional Proportion rule.

The simplest capture method is Equality. If a piece could move to another piece with the same number, it takes that piece. The Obsidion capture is a trap: if four pieces prevent another piece from moving horizontally or vertically, it is taken.

If two pieces from one team can each move to the same piece of the other team, and those two pieces can add, subtract, multiply, or divide to make the number on the opposing piece, they capture that piece. Whether one of the two attacking pieces has to move into the space they are attacking depends on when the possible capture appears. If a player moves a piece on their turn, bringing it into position for an addition, subtraction, multiplication, or division capture, then they immediately take the other player’s piece without having to move their piece again. On the other hand, if a player notices a possible capture at the start of their turn, before they have moved a piece, they must place one of their attacking pieces in the captured piece’s space in order to take a piece by addition, subtraction, multiplication, or division.

Pyramids may not be taken by equality. They may be taken by obsidion, by addition, subtraction, multiplication, or division, by the optional proportion capture if this is in play, or by taking the pieces with square numbers that make up the pyramid one by one.

Capture by Proportion

What is the optional rule for taking pieces by proportion? Lever and Fulke refer to arithmetic, geometric, and musical or harmonic proportion, so this optional rule has three sub-rules.

Capture by arithmetic proportion is similar to capture by addition: if two pieces may move into the space of a third and the numbers on all three pieces fit into a partial arithmetic sequence of the form $n, n+a, n+2a$, then the third piece is captured. Three pieces may also capture a fourth by arithmetic proportion. Capture by geometric proportion uses the same idea, but using partial geometric sequences of the form $n, an, a^2n$ or $n, an, a^2n, a^3n$.

Musical proportion only applies to three-term sequences. Lever and Fulke give a “definition” of musical proportion:

Musicall proportion is when the differences of the first and last from the middes, are the same, that is betwene the first and the last, as .3.4.6., betwene .3. and .4. is .1. betwene .4. and .6. is .2. the whole difference is .3. which is the difference betwene .6. and .3. the first and the last.

Unfortunately, this “definition” of musical proportion would apply to any three numbers $a, b, c$. We are comparing $(b-a) + (c-b)$ with $c-a$, but these values are the same! The correct definition of musical proportion (perhaps better known as harmonic proportion) uses ratios. Three numbers $a,b,c$ with $a < b < c$ are in harmonic proportion if $c:a = (c-b):(b-a)$. For example, $4,6,12$ is a musical proportion, because $12:4 = 3:1$ and $(12-6):(6-4) = 6:2 = 3:1$. We can now say that capture by musical proportion happens when two pieces may move into the space of a third and all three pieces fit into a harmonic proportion.

Victory Conditions

Even figuring out how to take pieces in rithmomachia is complex! Thus, players may agree on any of several victory conditions. These are divided into “common” victories, which are based on capturing enough pieces by some measure, and “proper” victories (also known as triumphs) which involve capturing the enemy’s pyramid and then arranging three or four of one’s one pieces to create an arithmetic, geometric, or harmonic proportion.

Here are the common victories.

  • Victory of bodies: The first player to take a certain number of pieces wins.
  • Victory of goods: The first player to take pieces adding to at least a certain number wins.
  • Victory of quarrel: The first player to take pieces adding to at least a certain number and using a certain total number of digits wins. (This prevents a player from winning by taking a single very high value piece, as might be possible in the victory of goods.)
  • Victory of honor: The first player to take a specified number of pieces adding to at least a certain number wins.

Let us quote Lever and Fulke on how to complete a proper victory or triumph:

When the king is taken, the triumph must be prepared to be set in the adversaries campe. The adversaries campe is called al the space, that is betweene the first front of his men, as they were first placed, unto the neither ende of the table, conteyning .40. spaces or as some wil .48. When you entend to make a triumph you must proclaime it, admonishing your adversarie, that he medle not with anye man to take hym, whiche you have placed for youre triumphe. Furthermore, you must bryng all your men that serve for the triumph in their direct motions, and not in theyr flying draughtes.

To triumphe therefore, is to place three or foure men within the adversaries campe, in proportion Arithmeticall, Geometricall, or Musicall, as wel of your owne men, as of your enemyes men that be taken, standing in a right lyne, direct or crosse, as in .D.A.B. or els .5.1.3. if it consist of three numbers, but if it stande of foure numbers, they maye be set lyke a square two agaynst two.

Anyone who attained a proper victory would indeed feel triumphant!

Further Reading

  • Rafe Lever and William Fulke, The Philosophers Game, posted by Justin du Coeur.
  • Michael Masi, Boethian Number Theory: A Translation of the De Institutione Arithmetica (Rodopi, 1996)
  • Ann E. Moyer, The Philosophers’ Game: Rithmomachia in Medieval and Renaissance Europe (Ann Arbor: University of Michigan Press, 2001).

The Once and Future Feature Column

We’re going to look back at the Column’s history, revisit some of our favorite columns, and talk about what comes next. Spoiler alert: We’re recruiting new columnists!

Ursula Whitcher
AMS | Mathematical Reviews, Ann Arbor, Michigan

The number 24 has many charming properties. For instance, it can be written as $4!$ (that is, $24 = 4 \times 3 \times 2 \times 1$), and it is one of the first nonagonal numbers (the number of dots that can be evenly arranged on the sides of a regular nine-sided polygon). This year, 24 has an even more charming feature: the Feature Column is celebrating its 24th birthday (or, if you prefer, the Feature Column is 4!)

nested nonagons with 1 9 and 24 points

The first three nonagonal numbers. From Eric W. Weisstein, “Nonagonal Number.” (MathWorld–A Wolfram Web Resource.)

Loyal readers of the Column may have noticed some recent changes: a new address (https://blogs.ams.org/featurecolumn/), a shiny new banner incorporating artwork by Levi Qışın, and new navigational tools. This month, we’re going to look back at the Column’s history, revisit some of our favorite columns, and talk about what comes next. Spoiler alert: We’re recruiting new columnists! Check out the last section for information about how to get involved.

The First Feature Columns

The Feature Column was founded in 1997. Its goals were to increase public awareness of the impact of mathematics and take advantage of the functionality the then-new World Wide Web offered for sharing pictures through the internet. The Column appeared was before blogs were invented: indeed, the Oxford English Dictionary dates the very first use of the long form “weblog” for a blog-like enterprise to December 1997. By that time, the Column had been running for months.

a brightly colored web of interconnected line segments

A visualization of the 1997 internet, from opte.org. (CC BY-NC 4.0.)

Steven H. Weintraub wrote the first Feature Columns. The early columns were focused on images, including intertwined knots and pictures taken by the Pathfinder Mars rover. Weintraub also took advantage of the internet’s capability to spread news quickly: Feature Columns could be posted right away, rather than adhering to the publication schedule of the AMS Notices or the Bulletin.

photo of a white man with glasses and a beard

Steven H. Weintraub.)

Some of the early columns had an element of adventure. Steven Weintraub recalls:

One that I remember in particular was the September 1988 Feature Column “Prize Winners at the 1998 International Congress of Mathematicians“. The column itself was a listing of the winners of the various prizes, with links to further information about them and their work. But there is a more interesting back story. In 1998 email and the internet were far less widespread than they are today. I attended the 1998 ICM in Berlin, and, feeling like an old-time reporter, as soon as the prize ceremony was over, I rushed out to a phone booth, called up Sherry O’Brien, the AMS staff member with whom I worked with on WNIM (“What’s New in Mathematics”), and told her who the winners were. She promptly posted the information on the AMS website, and that was how many people first found out the news.

Over the course of the next few years, Tony Phillips, Bill Casselman, and Joe Malkevitch took on roles as regular Feature Columnists. They explored the web’s potential for longer columns, serializing some explorations over multiple months. They were later joined by Dave Austin, and eventually by me. For much of the Column’s existence, it was ably edited by Michael Breen, the longtime AMS Public Affairs expert. I took over the editorial role in 2020.

Some Favorite Columns

The Feature Column has always been curiosity-driven. Though individual columns may riff on current events, the underlying mathematics is enduring. Thus, individual columns have enduring popularity: some have been read and re-read for decades. Here are some of our most popular Feature Columns.

  • In 2009, David Austin made a stirring case for a key concept in applied linear algebra: We Recommend a Singular Value Decomposition. Austin illustrates the Singular Value Decomposition with clear diagrams and inviting applications, from data compression to the $\$1$ million Netflix prize.
  • In 2016, Bill Casselman investigated The Legend of Abraham Wald. We’ve all seen the meme: a diagram of a fighter plane, with red marks for bullet holes on the wings and tail, but not the engine. The legend says that Abraham Wald identified this phenomenon as an example of survivorship bias: airplanes with damage in other places did not survive to be measured. Casselman explores what we do and do not know about the real, historical Abraham Wald.
  • diagram of an airplane with red dots concentrated on wings and tail

    Image by Martin Grandjean, McGeddon, and Cameron Moll. (CC 4.0.)

  • In 2004, Joe Malkevitch wrote about Euler’s Polyhedral Formula, describing it as one of his all-time favorite mathematical theorems and also one of the all-time most influential mathematical theorems. Joe Malkevitch’s research focuses on graph theory and discrete geometry, including the properties of polyhedra. His description showcases his expertise, enthusiasm, and long-standing interest in the history of mathematics.
  • In 2008, Tony Phillips described The Mathematics of Surveying. His discussion offers practical, tangible applications for key concepts in school mathematics, from similar triangles to estimated volumes.
  • In the summer of 2020, I (Ursula Whitcher) wrote Quantifying Injustice, describing statistical strategies for assessing predictive policing algorithms. These algorithms can both obscure and magnify police injustices: new research provides tools to identify problems and measure their impact.

New Columnists

We’re looking for mathematicians who are enthusiastic about communicating mathematics in written and visual form to join the Feature Column! The typical commitment is two columns per year, though we occasionally welcome guest columnists. We are particularly excited about involving columnists with a variety of backgrounds and experiences. Please send questions and letters of interest to uaw@ams.org. If you’re ready to apply, include a CV and a writing sample!

AMS Feature Column banner

An epidemic is a sequence of random events

If a contact is made, then whether or not infection is transferred is much like tossing a (loaded) coin. How can a simulation take all this uncertainty into account?

Bill Casselman
University of British Columbia

Just recently, I started thinking about making my own epidemics. On a computer, of course, with a digital population.

What I had in mind was not very sophisticated. I would assemble a population of `people’, a few of which are infected with the virus, and then progress from one day to the next, letting the virus spread around through interactions in the population. At each moment, a person would be in one of several possible states:

  • S Uninfected but susceptible to infection
  • E Infected, but not yet infectious (exposed)
  • A Infectious, but not yet showing symptoms (asymptomatic)
  • Infectious and showing symptoms
  • Recovered, or otherwise incapable of becoming infected (say, through vaccination)

This is not quite a complete representation of reality. For example, in the current epidemic a very small number of people get reinfected. But it is not too far from being correct.

In general, as the simulation goes on, a person would progress from one state in this list to the next, except of course that being vaccinated is a shortcut from the first to the last state. Infections take place because susceptible persons interact with contagious ones. Even when an interaction takes place, whether or not infection is transmitted is a function of many accidental circumstances (for example, surrounding ventilation) as well as how contagious the infected person is.

There is some further internal detail to some of these states. The degree to which a person is infectious changes in time, usually rising to a peak after a few days, and then decreasing to zero. Hence in a simulation each person has attached to him in addition to (i) a designation of state but also in states A and I (ii) a number measuring infectiousness. A further datum is (iii) the frequency of contacts, especially close contacts, a person has with others. This can change with time. For example, when a person starts showing symptoms, he will presumably reduce the frequency of his contacts.

Where’s the mathematics? An epidemic is driven by random events. The moment at which a person moves from one state to the next is not fixed by circumstances, but is instead a matter of probabilities. The severity of a person’s infection is a matter of chance, as is the length of time from when he is infected to when he becomes infectious. Even if we know the average rate at which an infectious person makes contacts, the exact number of contacts made in one day is also a matter of chance. If a contact is made, then whether or not infection is transferred is much like tossing a (loaded) coin. How can a simulation take all this uncertainty into account?

Generating contacts

Take the matter of contacts. The most important parameter governing contacts is the average number $c$ of contacts made by a person in one day, but that does not mean that the number of contacts in one day is constant. It might well vary from day to day. Instead, it is reasonable to assume that personal interaction is a Poisson process, which means that the probability of making $k$ contacts during one day is $p_{k} = c^{k} e^{-c} / k!$. Note that the infinite sum of the $p_{k}$ is $1$, because of the well known formula

$$ e^{c} = 1 + c + {c^{2} \over 2!} + { c^{3} \over 3! } + \cdots \, . $$

For example, here are the graphs of some examples with a couple of values of $c$:

In a simulation, one will be dealing with a large number of people. Each of them will have his own regimen of interactions. Some of them will be more interactive than others. Thus, we are likely to find ourselves simulating a large number of independent Poisson processes, each one a sequence of random events. How to do this? In a program, this will involve a call to a routine, call it p_random(c) that returns on each call a random non-negative integer whose distribution matches the Poisson process with mean $c$.

Almost every programming language has built into it a routine random() that does something like this. On each call it returns a real number uniformly distributed in the open interval $[0,1)$. (David Austin’s previous FC gives some idea of how this works.) What we would like to do is use that routine to generate non-negative integers following a specified Poisson distribution. To give you some idea of how things go, we can see how this technique can generate integers uniformly distributed in any integral range $[0,n-1]$: get a random number $x$ in $[0,1)$ and then replace it by $\lfloor nx \rfloor$, the integral part of $nx$. If $n=2$ this offers a simulation of coin tossing, and if $n=6$ a simulation of throwing a die.

There is a reasonably well known procedure that does what we want, and very generally. This is explained in Knuth’s classic text. Suppose we are given an arbitrary probability distribution of integers with given probabilities $p_{k}$ for $k \ge 0$. That is to say, we are looking at some repeated event somewhat like coin tossing, in which a non-negative integer $k$ occurs with probability $p_{k}$. How can a program generate integers distributed according to these statistics?

Let $P(k)$ be the cumulative distribution

$$ P(k) = {\sum}_{i = 0}^{k} p(i) $$

Thus $P(k)$ is the probability that the integer $j$ occurring is $\le k$. The original distribution has the property that each $p_{i} \ge 0$ and ${\sum} p(i) = 1$, so $P(k)$ increases from $0$ to $1$. For example, if $c = 2.5$ and $p(k) = e^{-c} c^{k} / k!$ then the graph of $P$ looks like the figure below. Given a random number $t$ in $[0,1)$ we can determine an integer according to the recipe indicated—draw a line to the right from the point $(0,t)$ and select the $x$-coordinate of the point at which it hits this graph.

There is another suggestive way to see this. Make up a rectangle of total height $1$, partitioned into boxes, with the one labeled $k$ of height $p_{k}$. Given the number $x$, mark a point at height $x$ in the rectangle. Select the label of the box that contains it.

In the long run the number of times you hit the box labeled $k$ will be proportional to its area, hence to $p_{k}$. But how do you tell what that label is? There is one straightforward answer to this question:

def p_random():
	x = random()
	# this is the built-in random number generator
	s = 0
	i = 0
	while s <= x:
		i += 1
		s += p[i]
	# at exit p[0] + ... + p[i-1] <= x < p[i]
	return i-1

But this is somewhat inefficient, since each call will on average involve $n/2$ steps. Does there exist an algorithm that requires a number of steps independent of $n$? The answer is yes. A clever method whose basic idea is apparently due to Alastair Walker does this, at the small cost of building some preliminary structures.

Walker’s trick

As far as I know, Walker never explained how he discovered his method, but an elegant interpretation has been offered by Keith Schwartz. The basic idea is what we have already seen:

  1. Start with a box of some kind. Partition it into smaller labeled boxes in such a way that the area of box $k$ is proportional to $p_{k}$.
  2. To generate integers with a given probability distribution, choose points at random inside the box, and return the label of the region hit.
  3. Arrange a way to assign to every random $x$ in $[0,1)$ a point of the box.

The problem is to figure out how to make the partition in such a way that figuring out the label from the geometry of the partition can be done efficiently.

I’ll explain how Walker’s method works for a few simple cases, but first I’ll generalize the problem so that we are not restricted to the Poisson distribution. Suppose very generally that we are given probabilities $p_{i}$ for $i$ in $[0, n-1]$. We now want a method to generate random integers that follow the distribution assigned by $p_{i}$. That is to say, if we generate in this way a large number of integers, we want the proportion of occurrences of $i$ to be roughly $p_{i}$.

The case $n=2$ is like tossing a biased coin, and there is a simple solution. In this case, we are given two probabilities $p_{0}$, $p_{1}$ with $p_{0} + p_{1} = 1$. Partition the unit square in this fashion:

Choose a point $(x, y)$ randomly in the square. In fact, we do not have to pay any attention to $x$. If $y \le p_{0}$ we return $i = 0$ and otherwise we return $i = 1$.

But now, following Keith Schwartz and intending to show how Walker’s algorithm works in this very simple case, I will set up a rectangular region a bit differently. First of all, make its dimensions $2 \times 1$. Partition it twice: once into halves, each half a unit square …

… and then build in each half, say in the $i$-th half, a box of dimensions $1 \times p_{i}$. Label these boxes. Unless $p_{0} = p_{1}$, one of these will overflow at the top:

So then we cut off the overflow and paste it (with label) into the other box:

This shows the case $p_{0} \le p_{1}$. If $p_{1} < p_{0}$ things look like this:

How do we use these diagrams to generate the random integers we want? Choosing a random uniform number $x$ in $[0,1)$ amounts as before to choosing a point in the rectangle. But we do this differently, and we interpret it differently. Given $x$, set $X = 2x$. Let $m$ be the integer part of $X$, which will be either $0$ or $1$: $m = \lfloor X \rfloor$. Let $y = X – m$, the fractional part of $X$. Associate to $x$ a point in the $m$-th box with height $y$. If $y \lt p_{m}$, then we are in the box labeled by $m$, otherwise in the other one. In either case, the process will select that label $m$.

Now look at the case $n = 3$, and suppose that we are given probabilities $p_{0}, p_{1}, p_{2}$ with $\sum p_{i} = 1$. We start off with a rectangle of size $3 \times 1$, partitioned into $1 \times 1$ boxes:

There are again different cases, depending on the relative sizes of the $p_{i}$. The easiest case is that in which two of the values of $p$, say $p_{0}$ and $p_{1}$, are less than $1/3$, which implies that the third is greater than $1/3$. Draw the $i$ boxes of dimension $1 \times p_{i}$ in the $i$-th square, like this:

Now cut off two pieces from the large box and paste them into the smaller one, getting:

I’ll explain in a moment how to use this to generate random numbers.

There is a second case, in which two of the $p_{i}$ are larger than $1/3$:

Here, we want to cut off from the tops and fill in the third. It is tempting to cut off exactly the overflows in the large regions and paste them in, but this would give the third region three labels. which is not what we want. So we fill in from just one of the large regions. This will leave some space in it.

We fill in the new empty space from the other large region. We are now finished:

How to use what we have constructed? In each case, we have partitioned the rectangle of size $3 \times 1$. First, into three unit squares, and then each of these in turn into one or two labeled rectangles. Given a random $x$ in $[0,1)$, we want to come up with some integer in $[0,3)$. How? We first scale it to get $X = 3x$. This will lie in the interval $[m, m+1)$ for $m = \lfloor X \rfloor$. We now turn attention to the $m$-th unit square. The integer we return will be one of the labels found in that square. Let $y = X – m$, the fractional part of $X$, which will be in $[0,1)$. If $y \lt p_{m}$ (the height of the bottom rectangle), p_random returns $m$, otherwise the alternate label in that square.

In effect we are assigning principal and alternate labels to the boxes. Except that there won’t be an alternate label if the box is full.

In the literature, the array I call `alternate’ is called alias, and the method described here is called the alias method.

The full algorithm

This method generalizes nicely. The original version seems to be due to Alastair Walker. It became well known because Knuth called attention to it (although mostly in exercises). Michael Vose then came up with a more efficient version, and made it handle rounding errors more stably.

I quote below, almost verbatim, the algorithm found originally in Vose’s paper. It improves the running time of Walker’s program, and corrects its handling of rounding errors. It has two parts. One is an initialization that sets up arrays prob and alias from the probability array $p$. These are used in the function rand, which returns a random variable in the range $[0,n-1]$, whose probabilities are specified in p of length n. The call to random returns a variable uniformly distributed in $[0, 1)$.

There are several loops in the program.The first assigns integers in the range $[0,n-1]$ to one of two explicit arrays large and small. Those in small are the $i$ such that $p_{i} \le 1/n$. As the program proceeds, the integers in these arrays are those whose boxes have not yet been filled. Implicit is a third subset of $[0,n-1]$, which I’ll call finished. This contains all those indices for which no further processing is necessary—i. e. whose box is filled.

In the loop [0] the two arrays small and large are initialized, and the subset finished is left empty. In every run through this loop, an index is taken from small, its square is filled, and it is added to FIN. This happens by removing filling material form one of the boxes in large, which therefore becomes smaller. It is added to either small or large, according to how much is left. In each of these loops, the total size of large and small is decremented.

def init(p):
    l = 0
    s = 0
    [0] for i in range(n):
        if p[i] > 1/n:
            large[l] = i
            l += 1
        else:
            small[s] = i
            s += 1
    [1] while s > 0 and l > 0:
        s -= 1
        j = small[s]
        l -= 1
        k = large[l]
        prob[j] = n*p[j]
        alias[j] = k
        p[k] += (p[j]-b)
        if p[k] > b:
            large[l] = k
            l += 1
        else:
            small[s] = k
            s += 1
    [2] while s > 0:
        s -= 1
        prob[small[s]] = 1
    [3] while l > 0:
        l -= 1
        prob[large[l]] = 1

def p_random():
    x = n*random(0, 1)
    m = floor(x)
    if (x - m) < prob[m]: return m
    else: return alias[m]

The last loops [2] and [3] of Vose’s code are necessary to deal with rounding errors, which I include without further comment.

Here is a typical run for a Poisson process with mean $\mu = 4$.

The simulation

Let’s see how to use Walker’s method to simulate how a person just infected goes on to infect others. Suppose that he starts to be infectious on the fifth day, and that the probability that he infects a contact is specified in the following table:

$$ \matrix { i = \hbox{day after infection} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & \ge 10 \cr r_{i} = \hbox{probability of infection} & 0 & 0 & 0 & 0 & 0.1 & 0.3 & 0.4 & 0.4 & 0.2 & 0 \cr } $$

Suppose also that he makes and average of $4$ close contacts per day, and that these follow a Poisson distribution. Applying Walker’s algorithm, we get a sample run of contacts like this:

$$ \matrix { i = \hbox{day after infection} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\ \dots \cr c_{i} = \hbox{number of contacts} & 5 & 2 & 3 & 4 & 3 & 3 & 2 & 1 & 3 & 3\ \dots \cr } $$

In this run, how many people does he infect? There is no unique answer to this question. It depends on something like a coin toss at each contact. What we can calculate is an average. On the fifth day he infects an average of $0.1 \cdot 3$ people, on the sixth … etc. All in all:

$$ \hbox{average number of infections} = {\sum} c_{i} r_{i} = 0.1 \cdot 3 + 0.3 \cdot 3 + 0.4 \cdot 2 + 0.4 \cdot 1 + 0.2 \cdot 3 = 3.0 \, . $$

This is much lower than the average number of people he infects, which is called the $R_{0}$ value for this example. Here it is $(0.1 + 0.3 + 0.4 + 0.4 + 0.2) \cdot 4 = 5.6$.

Reading further

In Praise of Collaboration

Take a look at an extraordinary collaboration in discrete geometry and related geometrical mathematics, the collaboration of Branko Grünbaum and Geoffrey Colin Shephard.

Joe Malkevitch
York College (CUNY)

Introduction

Point and line do many activities together—their collaborations create a rich texture for many mathematicians and geometers, as well as artists/designers/weavers, to enjoy, study and admire. But point and line are generally undefined terms in the axiom systems that are at the foundations of different types of geometry (Euclidean, projective, hyperbolic, taxicab, etc.) as a mathematical investigation. Despite this, the study of the interactions of points and lines are definitely worthy of exploration.

Pappus configuration containing 9 points and 9 lines



Figure 1 (A Pappus configuration in the Euclidean plane. There are 9 points and 9 lines with three points per line and three lines passing through each point. In the Euclidean plane the line displayed in the space between the two labeled lines MUST pass through the three points indicated.)

Finite Fano plane containing 7 points and 7 lines



Figure 2 (The finite Fano Plane, seven points and seven lines in a finite plane where every line intersects each other line in one point and cannot be drawn with straight lines with exactly 3 points per line in the Euclidean plane. This fact is related to the Sylvester-Gallai Theorem.)

Scholarly work in experimental physics is typically published with the names of a large team of scientists, often not at the same institution, who collaboratively worked together on the experimental design and the carrying out of the project involved. By comparison, mathematics is sometimes viewed as a rather solitary activity—a subject which, if one is armed with a pencil and paper, one can make major or minor original contributions to. Even before the advent of the Web and the internet, mathematics had examples, usually of pairs of people, who enriched each other’s work by a collaboration. My goal here is to make the results of one such collaboration, by Branko Grünbaum (1929-2018) and Geoffrey Shephard (1927-2016) in the area of discrete geometry more widely known. (Grünbaum and Shephard are sometimes abbreviated G&S below—shades of the great operetta collaboration of Gilbert and Sullivan!) At the same time I want to point out collaboration as a model for doing effective and important research in mathematics.

Mathematical collaborations

The list of superstar contributions to mathematics that are attached to the names of individual people date back in time within many cultures—names include:

Euclid, Archimedes, Newton, Euler, Gauss, Kovalevskaya, Noether, Lagrange, Ramanujan, …

These distinguished individuals, whose names belong to a list that stretches over long periods of time and in many countries/cultures, did not work in a vacuum. They were inspired by others and in turn inspired others to do important mathematics. Famously, Newton pointed out:

“If I have seen further it is by standing on the shoulders of giants.”

However, mathematical collaborations prior to the 20th century seem to be relatively rare. There are examples of mathematicians consulting with one another via correspondence, but this did not typically result in joint publications with the letter exchangers working together. In past eras scholars often could only communicate easily via written letters because travel between distant places (e.g. England and Germany) was costly and time consuming.

One often-cited example of a productive collaboration was between Geoffrey Hardy (1887-1947) and John Edensor Littlewood (1885-1977). Hardy is perhaps most famous for his book A Mathematician’s Apology but also made significant contributions to number theory, many of these with Littlewood.

Photo of G.H. Hardy Photo of J. E. Littlewood

Figure 3 (Photos of G.H. Hardy (left) and J. E. Littlewood (right). Courtesy of Wikipedia)

Photo of Hardy and Littlewood

Figure 4 (A photo of G. H. Hardy and J. E. Littlewood)


One of the conjectures made by Hardy and Littlewood is still unsolved to this day. The conjecture states that the number of primes in the interval $(m,m+n]$ (this is the interval not containing its left endpoint but containing its right endpoint) will be less than or equal to the number of primes in the interval $[2,n]$ (this interval includes both its left and right endpoints).

Hardy and Littlewood certainly had a successful collaboration and it seemed to survive with time. According to their contemporary Harald Bohr, they operated their collaboration using the following rules:

  • It didn’t matter whether what they wrote to each other was right or wrong.
  • There was no obligation to reply, or even to read, any letter one sent to the other.
  • They should not try to think about the same things.
  • To avoid any quarrels, all papers would be under joint name, regardless of whether one of them had contributed nothing to the work.

There is considerable discussion about whether these rules are ethical, but presumably they did guide Hardy and Littlewood.

Below, I would like to take a look at an extraordinary collaboration in discrete geometry and related geometrical mathematics, the collaboration of Branko Grünbaum and Geoffrey Colin Shephard. I will begin by giving some biographical information about these two distinguished mathematicians, then discuss their collaboration and some of its accomplishments.

Brief biography of Geoffrey Colin Shephard

Photo of Geoffrey Shephard

Figure 5 (Photo of Geoffrey Shephard. Photo courtesy of the Oberwolfach photo collection.)



Shephard was born in 1927. He studied mathematics at Cambridge and received his undergraduate degree in 1948. His doctorate degree was also from Cambridge University (Queens College), completed in 1954. His doctoral thesis adviser was John Todd. The title of his thesis was Regular Complex Polytopes. After working at the University of Birmingham he later moved to the University of East Anglia in 1967 and retired from there in 1984. He had at least two doctoral students—the geometer Peter McMullen (who works on problems about regular, primarily convex polyhedra in 3-dimensions and higher) and the specialist in convexity geometry, Roger Webster. You can get some idea of the mathematical topics that Shephard made scholarly contributions to from this list generated by MathSciNet:

  • Combinatorics
  • Convex and discrete geometry
  • Geometry
  • Group theory and generalizations
  • History and biography
  • Linear and multilinear algebra; matrix theory
  • Manifolds and cell complexes
  • Number theory

Shephard, who was an active member of the London Mathematical Society (LMS), provided money for LMS to fund what is known as the Shephard Prize. Here is its description:

"The Shephard Prize is awarded by the London Mathematical Society to a mathematician or mathematicians for making a contribution to mathematics with a strong intuitive component which can be explained to those with little or no knowledge of university mathematics, though the work itself may involve more advanced ideas."

The prize has been awarded to Keith Ball and Kenneth Falconer (both of whom are fellows of the Royal Society).

Brief biography of Branko Grünbaum

Photo of Branko Grünbaum

Figure 6 (A photo of Branko Grünbaum, courtesy of Wikipedia.)


Grünbaum was born in Osijek, in what today is known as Croatia, which at various times was part of the conglomerate country known as Yugoslavia. Though he was of Jewish background, he lived with his Catholic maternal grandmother during World War II, a period during which the Nazis occupied Yugoslavia. In high school he met his future wife, Zdenka Bienenstock, from a Jewish family, who survived the war hidden in a convent. Grünbaum began studies at the University of Zagreb but unsettled times in Yugoslavia resulted in Grünbaum and his bride-to-be emigrating to Haifa in Israel in 1949. Grünbaum found work and eventually resumed his mathematical studies at Hebrew University, earning a Master’s degree in 1954. This was also the year he married Zdenka, who trained to be a chemist. He did work for the Israeli air force in operations research while continuing his studies culminating with his doctorate in 1957. While many think of Grünbaum’s doctoral thesis adviser Areheh Dvoretsky (1916-2008) as a functional analyst, it was his budding talent as a geometer that Dvoretsky fostered. His doctoral dissertation was in Hebrew and entitled On Some Properties of Minkowski Spaces. At the completion of his military service Grünbaum came to America to do postdoctoral work at the Institute for Advanced Study with his family. By 1960 he had gone on to the University of Washington in Seattle, where the distinguished geometer Victor Klee (1925-2007) was already teaching. Eventually, after returning for a period to Israel, Grünbaum made his way back to a long career at the University of Washington where he taught until his retirement in 2001, and was part of the group of scholars there working in discrete geometry. He had 19 doctoral students and many more mathematical descendants. It was problems that he posed that I solved in my own doctoral dissertation. Grünbaum’s academic descendants include some of the most prominent discrete geometers of our time, in particular, Noga Alon and Gil Kalai. Over the years he won many prizes including the Lester R. Ford Award (1976) and the Carl B. Allendoerfer Award (2005).

The list of research publication areas of Grünbaum from MathSciNet can be compared with the one earlier for Shephard:

  • Algebraic topology
  • Combinatorics
  • Computer science
  • Convex and discrete geometry
  • Differential geometry
  • Functional analysis
  • General
  • General topology
  • Geometry
  • History and biography
  • Manifolds and cell complexes
  • Mathematical logic and foundations
  • Ordinary differential equations
  • Topology

Some of Geoffrey Shephard’s "solo" work:

Before his collaboration with Grünbaum, Shephard was particularly interested in the properties of convex sets, not merely convex polyhedra. For example he did important work related to convex polytopes and geometric inequalities. He also wrote several papers about the Steiner Point. Shephard also studied the issue of when various convex sets could be written as a Minkowski sum. The idea was to see when convex sets could not be "factored" into "simpler" sets, an approached modeled on the idea of the importance of prime numbers in number theory. A convexity problem due to Shephard that is still actively being looked at concerns centrally symmetric convex closed and bounded sets $U$ and $V$ in $n$-dimensional Euclidean space. Suppose the volumes of the projections of $U$ on a hyperplane always are smaller or equal than that of $V$. Shephard conjectured that in this case the volume of $U$ is less than or equal to that of $V$.

One particularly intriguing question has recently been associated with Geoffrey Shephard’s name. The problem involves an idea which is often linked to the name of Albrecht Dürer.

Portrait of Albrecht Dürer

Figure 7 (A self-portrait of Albrecht Dürer. Courtesy of Wikipedia.)



Dürer (1471-1528), in his work Underweysung der Messung, explored ways to represent 3-dimensional objects in paintings in a "realistic" manner. This lead to his investigating questions about perspective and in particular perspective involving 3-dimensional polyhedra. He had noticed that one way to represent a polyhedron on a flat piece of paper was to show the polygons that make up the polyhedron as a drawing in the plane with each polygon shown by a flat region which shared one or more edges with other polygons that made up the polyhedron. Dürer produced examples of such drawings for Platonic and other solids which consisted of regular polygons. Figure 8 shows an example of the kind of diagram that interested Dürer, in this case for the regular dodecahedron, a Platonic Solid with 12 congruent regular polygons as faces.

Net of a regular dodecahedron

Figure 8 (The polygons shown can be assembled into a regular dodecahedron by properly gluing the edges.)

Such drawings have come to be called nets but there is considerable variance in exactly what is meant by this term. It would be best to restrict this term to a collection of polygons that arises from a (strictly) convex bounded 3-dimensional polyhedron $P$ (sometimes called a 3-polytope) which arises from the cutting of the edges of the vertex-edge graph of the polyhedron along edges that are connected, contain no circuit and include all of the vertices of the polyhedron (i.e., a spanning tree of the polyhedron). If the initial polyhedron had $V$ vertices then the number of edges of the spanning tree whose cuts enable the unfolding of the polyhedron will be equal to $V – 1$. Since each of the edges of the cut-tree is incident to exactly 2 faces, it follows that the boundary polygon of a net will be a simple polygon with $(2V – 2)$ sides. Note that while the original polyhedron may be strictly convex, the polygonal boundary of the net while a simple polygon (when the unfolding forms a net), is typically not convex and will often have pairs of edges that lie along a straight line. (See Figure 9 for all of the nets of a regular $1\times 1\times 1$ 3-dimensional cube.)

11 nets of the 3-cube

Figure 9 (The 11 nets of the 3-dimensional cube. Courtesy of Wikipedia.)



Even for some tetrahedra (4 vertices, 4 faces and 6 edges) it is known that one can cut along 3 edges of a spanning tree and "unfold" the resulting polygons so that the result overlaps in the plane. However, it turns out that no polyhedron has ever been found where cutting edges of SOME spanning tree will not result in an assemblage of polygons without overlap. The complexity of the issues involved in such folding and unfolding problems has been explored by Eric Demaine (MIT) and Joseph O’Rourke (Smith) and others. Shephard specifically looked at the question of when such an unfolding would result in a convex polygon in the plane but in light of his work on this collection of ideas I like to refer to the following still open problem as:

Shephard’s Conjecture

Every strictly convex bounded 3-dimensional polyhedron can be unfolded to a net (non-overlapping polygons one for each face of the polyhedron) by cutting edges of the polyhedron that form a spanning tree. (Often one gets different results depending on what spanning tree one cuts.)

Intriguingly, experts lack consensus as to whether the result should be true or false. There are many polyhedra where nearly all spanning trees lead to overlapping unfoldings while there are some infinite classes of convex polyhedra where a net has been shown to exist. One source of confusion about the concept of a net is that if one starts with a polyhedron $P$ and cuts along a spanning tree to result in the net $N$, it is possible that the net can be folded by using the boundary edges of the net to a polyhedron different from the one one started with—Alexandrov’s Theorem. Thus, to recover the polyhedron one started with one has to provide gluing instructions about which edges to glue to which other edges. There are "nets" that fold (edges glued to edges) to several non-isomorphic convex polyhedra, and there are "nets" that, when glued along boundary edges one way, give rise to a convex polyhedron but a different gluing of boundary edges yields a non-convex polyhedron.

Some of Branko Grünbaum’s "solo work"

Grünbaum’s webpage, still in place after his death in 2018, includes a list of his published works from 1955 to 2003, though he published many more additional works before his death in 2018. It is intriguing to compare the first 10 and last 10 titles in this bibliography:

  1. On a theorem of Santaló.

    Pacific J. Math. 5(1955), 351 – 359.

  2. A characterization of compact metric spaces. [In Hebrew]

    Riveon Lematematika 9(1955), 70 – 71.

  3. A generalization of a problem of Sylvester. [In Hebrew]

    Riveon Lematematika 10(1956), 46 – 48.

  4. A proof of Vázsonyi’s conjecture.

    Bull. Research Council of Israel 6A(1956), 77 – 78.

  5. A simple proof of Borsuk’s conjecture in three dimensions.

    Proc. Cambridge Philos. Soc. 53(1957), 776 – 778.

  6. Two examples in the theory of polynomial functionals. [In Hebrew]

    Riveon Lematematika 11(1957), 56 – 60.

  7. Borsuk’s partition conjecture in Minkowski planes.

    Bull. Research Council of Israel 7F(1957), 25 – 30.

  8. On common transversals.

    Archiv Math. 9(1958), 465 – 469.

  9. On a theorem of Kirszbraun.

    Bull. Research Council of Israel 7F(1958), 129 – 132.

  10. On a problem of S. Mazur.

    Bull. Research Council of Israel 7F(1958), 133 – 135.

  1. A starshaped polyhedron with no net.

    Geombinatorics 11(2001), 43 – 48.

  2. Isohedra with dart-shaped faces (With G. C. Shephard)

    Discrete Math. 241(2001), 313 – 332.

  3. Convexification of polygons by flips and by flipturns.

    (With J. Zaks)

    Discrete Math. 241(2001), 333 – 342.

  4. The Grunert point of pentagons.

    Geombinatorics 11(2002), 78 – 84.

  5. Levels of orderliness: global and local symmetry.

    Symmetry 2000, Proc. of a symposium at the Wenner–Gren Centre, Stockholm. Hargitai and T. C. Laurent, eds. Portland Press, London 2002. Vol. I, pp. 51 – 61.

  6. No-net polyhedra.

    Geombinatorics 11(2002), 111 – 114.

  7. Connected (n4) configurations exist for almost all n – an update.

    Geombinatorics 12(2002), 15 – 23.

  8. "New" uniform polyhedra.

    Discrete Geometry: In Honor of W. Kuperberg’s 60th Birthday

    Monographs and Textbooks in Pure and Applied Mathematics, vol. 253.

    Marcel Dekker, New York, 2003. Pp. 331 – 350.

  9. Convex Polytopes. 2nd ed., V. Kaibel, V. Klee and G. M. Ziegler, eds. Graduate Texts in Mathematics, vol. 221. Springer, New York 2003.
  10. Families of point-touching squares. Geombinatorics 12(2003), 167 – 174.

Grünbaum had a particularly remarkable talent of seeing new geometry ideas "hiding in plain sight" where other geometers had not noticed these issues. A good example of this is extending the definition of what should be meant by a regular polyhedron which led to many interesting new regular polyhedra. Today these new regular polyhedra are known as the Grünbaum/Dress polyhedra because Andreas Dress noticed one example missing in Grünbaum’s initial enumeration.

You can find some samples of the remarkable work of Grünbaum in this earlier Feature Column article.

An amazing collaboration

In trying to determine how Shephard and Grünbaum started their collaboration, one might look to MathSciNet for their earliest joint paper, but in the case at hand this would not return information about a joint venture they were part of. Grünbaum’s highly influential book Convex Polytopes was published in 1967 and one has to look at the table of contents and preface (or the list of Related names on MathSciNet) to realize that some of the chapters were prepared by other people. One of these people was Geoffrey Shephard. In 2005 this book won the American Mathematical Society Steele Prize for Mathematical Exposition.

While their earlier research had some aspects of convexity and the properties of polyhedra in common, Grünbaum’s work in many cases had an enumerative flavor. This approach to geometry was less present in Shephard’s work. It was aspects of their common roots but non-identical styles of work that perhaps made their collaboration so fruitful. Shephard points to the fact that G&S met for the first time face-to-face in Copenhagen, Denmark in 1965 at a convexity conference. At a meeting organized by the London Mathematical Society in July of 1975 it seems G&S set in motion the idea that they work together on a 12-chapter book about geometry, with discussion of many ideas about what it might contain. They conceived the idea of starting by writing a single chapter of such a book on patterns and tilings. It turned out that 12 chapters were necessary to do justice to the these ideas. This took about 11 years to develop and swelled to about 700 pages of writing, and it was this work that became their book Tilings and Patterns. The originally planned geometry book never got to be written!

There are many aspects of tiling and pattern issues that predate the work of G&S on this topic. However, many of these results are not part of the "standard" topics included in geometry curriculum in America. While many people have noticed with pleasure the multitudinous frieze patterns that they see on buildings, fabrics, and in art, they are unaware of a surprising mathematical result related to these frieze patterns. While one realizes that there are infinitely many different artistic designs that can constitute the "content" of a frieze (ranging from letters of the alphabet, stylized flowers, etc.) there is a mathematical sense in which there are only seven types of such friezes. The 7 types of patterns are shown below (Figure 10).

Examples of 7 types of friezes

Figure 10 (Diagrams illustrating the 7 different kinds of friezes. Image courtesy of Wikipedia.)

Rather amazingly, G&S, by developing the definition of a discrete pattern, were able to breathe fresh life into this rather old topic. By using their "new" notion of a discrete pattern they were able to "refine" the sense in which there are 7 types of friezes into a finer classification under which there are 15 such "discrete frieze patterns."

In Figure 11 below, using a small asymmetric triangle as a "building block" one can see how the 7 frieze patterns above can be refined into 15 patterns as pioneered by the work of G&S. The rows of the diagram show how each of the 7 types of friezes sometimes can be refined from one "frieze type" to several discrete pattern types—the letter labels are one method of giving names to these patterns). In thinking about the way the new system refines the old one, notice that in some of these patterns there is a "motif" in a single row and in some of them they can be thought of as being in two rows. The new approach of G&S involves the interaction between the symmetries of the motif and the symmetries related to the "global" pattern arising from translating along the frieze. A few more details can be found in the discussion further along.

Refined classification of 15 frieze patterns



Figure 11 (15 types of discrete frieze patterns; diagram courtesy of Luke Rawlings.)

The diagram shown in Figure 12 deals with some of the challenges that researchers such as Grünbaum and Shephard who were interested in the classification and enumeration of "symmetric" tilings and patterns face. Interest in symmetry has complicated roots related to work in art and fabrics and also from the science side—the study of minerals and crystals and flowers.

A pattern made up of rows of friezes

Figure 12 (A symmetric part of a wallpaper that extends infinitely in both the horizontal and vertical directions. Image courtesy of Wikipedia.)

What do we see in Figure 12? We see a portion of what is meant to be a part of an infinite object, extending what is there both in the vertical and horizontal directions. We see color and perhaps we are not sure what the foreground and background are. In Figure 13, do we have a white design on a black background or the other way around?

Black and white pattern

Figure 13 (A tiling of a square using black and white regions. It is not clear if the "design" is black on white or white on black. Image courtesy of Wikipedia.)


In Figure 12 it appears that the background might be the "rouge red" but one could perhaps give other interpretations. We see things that seem to be regions where the alternating rows are "separate" bands or friezes. The design that one sees in rows 2 and 4 perhaps remind one of something one might see on a Greek temple. The flower-like pattern of rows 1, 3 and 5 seems to have rotational symmetry as well as mirror or reflection symmetry. Geometers have developed a system of classifying symmetry in the Euclidean plane using distance preserving functions (mappings) which involve translations, rotations, reflections, and glide-reflections. These distance-preserving geometry transformations are known as the isometries. But in addition to such distance-preserving transformations diagrams like the ones in Figure 13, one might have color interchange transformations. There is another interesting difference between the rows that make up this symmetric pattern. The first and second rows both have translational symmetry in the horizontal direction and thus can be viewed as patterns onto themselves as friezes (band, strip) patterns. But the first row can be viewed as having a "discrete" motif which "generates" the pattern, something not true for the 2nd row. While there was a long tradition of looking at friezes and wallpapers which had a periodic structure and classifying them, rather remarkably the idea of looking at "patterns" with a motif like the one in row 1 had never been systematically discussed before Grünbaum and Shephard. Historically rather than looking at motifs, scholars had subdivided the "infinite" design in row 2 by breaking it up into a fundamental region (often there were different choices about how to do this) which was "propagated" via transformations into the periodic frieze or wallpaper.

G&S discovered that whereas there was a long, if not always "accurate" tradition of classifying and thinking about polyhedra, both convex and non-convex polyhedra, the situation for tilings (and the even less studied idea of patterns) was much less developed. There was some work on tilings that were the analogues for tiles of the Platonic Solids (work by Pappus) and Archimedean solids (work by Kepler) but little else. There were questions about the rules that one should allow for a tile. Thus, most examples of tilings looked at regions that were touching edge to edge as the "legal" tiles, but did it make sense to allow a region like Figure 14 as a tile or as a motif for a pattern?

possible tile with two squares joined by a line segment



Figure 14 (Is this "shape" allowed as a tile in a tiling of the plane?)

When one uses tiles to fill the plane, such as the tiling by squares which meet edge to edge shown in Figure 14, one has to think about whether or not the square tile is an "open" set (does not include its boundary points) or a "closed" set (contains its boundary points). For enumerations, note that one can get infinitely many "different tilings" with squares that have tiles that don’t all meet edge to edge by sliding some finite or even an infinite number of columns in Figure 15 half the length of one of the sides of the square up or down. What G&S did was to start from scratch in their joint work to give a coherent and accurate account of the "foundations" of a theory of tiling. They also extended classic enumerations by enumerating how many different "patterns" there could be when the motif used was dots, segments or ellipses. Rather surprisingly this had not been done previously. Along the way they had to invent new words for a variety of concepts that helped organize their new theory.

Tiling of the plane with squares



Figure 15 (One of the three "regular" Euclidean tilings of the plane known from ancient times. Image courtesy of Wikipedia.

Tiling of the plane by regular 3,4, and 6-gons



Figure 16 (A tiling of the plane with regular hexagons, squares and equilateral triangles. Courtesy of Wikipedia.)

While the study of polyhedra has deep roots and multiple "reinventions" of various aspects of the theory, the seemingly neighboring idea of a tiling, while widely present in the art, fabrics and designs of various countries going back thousands of years, had a less obvious footprint in being investigated mathematically. Perhaps this is because mathematical tools to investigate infinite graphs, one way of conceptualizing about tilings, and to discuss symmetry involving something that was "infinite" (the idea of an isometry group) were developed only much later than the tools the Greeks found to study polyhedra. While in modern times symmetry is studied using the tools of graph theory, group theory and the ideas of geometric transformations, in the early study of polyhedra (and tilings) examining the pattern of the polygons surrounding a vertex was the fundamental approach. Thus, even the identity that today is often called Euler’s polyhedral formula, that for a convex bounded polyhedron the number of vertices, faces and edges of the polyhedron are related by
$$V + F – E = 2,$$
which might well have been discovered in ancient times was "discovered" in about 1750 by Leonard Euler. This formula, which involves the 0, 1, and 2- dimensional faces of a 3-dimensional polyhedron can be generalized to $n$ dimensions. G&S were able to find "rules" for fruitful definitions of tiling and pattern and this was helped by their being able to find a tiling version of the $V + F – E = 2$ result. Being able to draw on graph theory and combinatorics was a big factor in their successful leap forward beyond prior work. They also were able to draw insights from the study of things which were "asymmetric," such as fractals.

Another thread tied together by the work of G&S was where the notion of an aperiodic tiling fit in. While tilings that had translational symmetry in two different directions was the major topic that had been looked at by mathematicians and crystallographers, Roger Penrose had made the spectacular observation that there were tiles which enable one to tile the plane but these tiles did not allow a tiling that was periodic—had translational symmetry in two directions. The work of G&S clarified and laid the foundation for more discoveries related to non-periodic and aperiodic tilings. Some confusion exists about the concepts of non-periodic and aperiodic tilings. While the definitions one sees are still in flux, here is the intuitive idea. There may be a non-periodic tiling of the plane with one or more tiles, meaning that the tiling involved does not have translational symmetry but when one has a set of tiles that tile the plane aperiodically, there is no way to use the tiles to get a periodic tiling. Historically, in the attention that the mathematics community gave to infinite tilings (patterns) in the plane there was translational (shift) symmetry in two directions involved. There are tiles which are convex quadrilaterals that can tile the plane periodically (all triangles and quadrilaterals tile the plane) and also with rotational symmetry but do not tile the plane aperiodically.

Tilings and Patterns

For Geoffrey Shephard, MathSciNet informs us that his "Earliest Indexed Publication" was in 1952 and his "Total Publications" were 149 in number. For Grünbaum MathSciNet informs us that his "Earliest Indexed Publication" was in 1955 and his "Total Publications" were 271 in number. Many of these were joint publications, and in fact, individual and joint published works of G&S exceed these numbers. While MathSciNet lists research papers written by both Grünbaum and Shephard, perhaps their most famous collaboration was the book Tilings and Patterns. This book originally appeared in 1987 and was relatively recently reprinted and updated by Dover Publications. Remarkably the book contained many results not previously published in research papers which complement their joint work in scholarly journals. Tilings and Patterns literally rewrote the landscape of the theory of tilings. Grünbaum and Shephard’s innovation was to try to combine the local and global approaches to tilings in the past so that a more comprehensive look at the phenomenon of tilings and patterns was possible. While the word pattern is widely used in mathematics, its use in conjunction with the word tilings in mathematics is rather novel. Yes, patterns appear in many tilings, but by placing the emphasis on a discrete "motif" G&S opened many new doors.

The collaborative work of Branko Grünbaum and Geoffrey Shepherd has catapulted the areas of discrete geometry involving tilings and patterns to a much broader audience and resulted in many new discoveries based on their ideas. More importantly, their collaboration should be an inspiration to the mathematics community that collaboration is exciting and personally rewarding and can result in giant leaps forward for progress in mathematics and its applications.

Dedication

I would like to dedicate this column to the memory of the remarkable geometers Branko Grünbaum and Geoffrey Shephard. It was my very good fortune to have had interactions with them both!

References

Note: No attempt has been made to give a complete set of the joint papers of Grünbaum and Shephard but several particularly important and interesting examples of their collaborative efforts are listed.

  • Brinkmann, G. and P. Goetschalckx, S. Schein, Comparing the constructions of Goldberg, Fuller, Caspar, Klug and Coxeter, and a general approach to local symmetry-preserving operations. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473 (2017), 20170267.
  • Conway, J. and H. Burgiel, C. Goodman-Strauss, The Symmetry of Things, A.K. Peters, Wesley, MA., 2008.
  • Delgado, O. and D. Huson, E. Zamorzaeva, The classification of 2-isohedral tilings of the plane, Geometriae Dedicata 42 (1992) 43-117.
  • Delgado, O. and D. Huson, 4-regular vertex-transitive tilings of E3. Discrete & Computational Geometry, 24 (2000) 279-292.
  • Demaine, E. and J. O’Rourke, Geometric Folding Algorithms, Cambridge University Press, NY, 2007.
    Dress, A., A combinatorial theory of Grünbaum’s new regular polyhedra, Part I: Grünbaum’s new regular polyhedra and their automorphism group, Aequationes Mathematicae. 23 (1981) 252-65.

  • Dress, A., A combinatorial theory of Grünbaum’s new regular polyhedra, Part II: Complete enumeration, Aequationes Mathematicae 29(1985) 222-243.
  • Escher, Maurits C., and D. Schattschneider, Visions of symmetry. Thames & Hudson, 2004.
  • Friedrichs, O. Delgado and D. Huson, Tiling space by platonic solids, I. Discrete & Computational Geometry 21, (1999) 299-315.
  • Grünbaum, B., Convex Polytopes, John Wiley, New York, 1967. (Parts of this book were developed by V. Klee, M. Perles, and G. C. Shephard.)
  • Grünbaum, B., Convex Polytopes, Second Edition, Springer, New York, 2003. (This includes work from the first edition by V. Klee, M. Perles, and G. Shephard and addition input for V. Kaibel, V. Klee and G. Ziegler.)
  • Grünbaum, B. Arrangements and Spreads, American Mathematical Society, for Conference Board of the Mathematical Sciences, Providence, 1972.
  • Grünbaum, B., The angle-side reciprocity of quadrangles. Geombinatorics, 4 (1995) 115-118.
  • Grünbaum, B., Configurations of Points and Lines, American Mathematical Society, Providence, 2009.
  • Grünbaum, B.,Side-angle reciprocity – a survey. Geombinatorics, 2 (2011) 55-62.
  • Grünbaum, B. and G.C. Shephard, The eight-one types of isohedral tilings in the plane, Math. Proc. Cambridge Phil. Soc. 82(1977) 177-196.
  • Grünbaum, B. and G.C. Shephard, Tilings and patterns, Freeman, 1987. (A second updated edition appeared in 2016, published by Dover Press, NY. Errors from the first edition were corrected and notes were added to indicate progress in understanding tilings and patterns.) There also exists a shorter version of the original published by Freeman in 1989, with only the first seven chapters of the original version.)
  • Grünbaum, B. and G.C. Shephard, Interlace patterns in Islamic and Moorish art, Leonardo 25 (1992) 331-339.
  • Huson, D., The generation and classification of tile-k-transitive tilings of the euclidean plane, the sphere and the hyperbolic plane, Geometriae Dedicata 47 (1993) 269-296.
  • McMullen and G.C. Shephard, Convex Polytopes and the Upper Bound Conjecture, London Math. Society, Cambridge U. Press, Cambridge, 1971.
  • O’Rourke, J. How to Fold It, Cambridge University Press, NY 2011.
  • Rawlings, L., Grünbaum and Shephard’s classification of Escher-like patterns with applications to abstract algebra, Doctoral thesis, Mathematics Education, Teachers College, 2016.
  • Roelofs, H. The Discovery of a New Series of Uniform Polyhedra, Doctoral dissertation, 2020.
  • Schattschneider, D., The plane symmetry groups: their recognition and notation. American Mathematical Monthly 85 (1978) 439-450.
  • Schattschneider, D., The mathematical side of MC Escher, Notices of the AMS 57 (2010) 706-718.
  • Shephard, G.C., Convex polytopes with convex, nets, Math. Proc. Camb. Phil. Soc., 78 (1975) 389-403.
  • Shephard, G.C., My work with Branko Grünbaum , Geombinatorics, 9(1999) 42-48. (Note the whole Volume 9, Number 2, October 1999 issue of Geombinatorics is a special issue in honor of the 70th birthday of Branko Grünbaum.)
  • Stevens, P., Handbook of regular patterns: An introduction to symmetry in two dimensions, MIT Press, Cambridge, 1981.
  • Washburn D. and D.W.Crowe, Symmetries of culture: Theory and practice of plane pattern analysis. University of Washington Press, 1988. (Reprinted 2021 by Dover Publications.)
  • Washburn D, and D.W. Crowe, editors, Symmetry comes of age: the role of pattern in culture, University of Washington Press, 2004.

Those who can access JSTOR can find some of the papers mentioned above there. For those with access, the American Mathematical Society’s MathSciNet can be used to get additional bibliographic information and reviews of some of these materials. Some of the items above can be found via the ACM Digital Library, which also provides bibliographic services.

Lost (and found) in space

There are standard references, such as the Sloan Digital Sky Survey, that provide an atlas for the stars. What's needed is a way to search through this enormous atlas to find the view presented by a given image.

David Austin
Grand Valley State University

My son recently sent me this picture of the comet Neowise that he took from southern California in the summer of 2020.

night sky and comet Neowise

Photo credit: Sam Austin

I had seen Neowise and knew roughly where it appeared, but being a comet, of course, its position changed over time. I wondered if there was a way to locate precisely where it was at the particular instant this picture was taken.

Well, there's an app for that. I uploaded the image to Astrometry.net and learned 36.1533 seconds later that we were looking here:

celestial sphere with white highlighting square zoomed in view of Ursa Major constellation

The image on the left places the picture on the celestial sphere, an imagined distant sphere centered at Earth. The coordinates depict declination (Dec), the angular distance north or south of the celestial equator, a projection of Earth's equator, and right ascension (RA), the angular distance east of the First Point of Ares, a reference point on the celestial equator. A closer view appears on the right where we see that we're looking at a portion of the constellation Ursa Major, which may be more familiar as the Big Dipper.

Here are some more details:

Center (RA, Dec): (135.836, 55.212)
Center (RA, hms): 09h 03m 20.603s
Center (Dec, dms): +55° 12' 44.873"
Size: 16 x 10.7 deg
Radius: 9.616 deg
Pixel scale: 9.6 arcsec/pixel
Orientation: Up is 313 degrees E of N

Astrometry.net also labels other features in the picture:

stars near view of comet Neowise highlighted in green

Measuring the positions and motions of celestial bodies falls to the branch of astronomy known as astrometry, and the problem of placing an image into a standard reference, as illustrated above, is called calilbrating the image's astrometry. The work of Astrometry.net, calibrating images without human intervention, is known as auto-calibration.

There are a few reasons why this is an important tool. First, calibrating an image taken by a camera mounted on a spacecraft in a fixed pose can be used to automate the determination of the spacecraft's orientation. Similarly, image calibration can be used to help a telescope track a target as Earth rotates.

Perhaps more importantly, decades of research have created a vast number of astronomical images. Sharing these images with a wide community of researchers is hampered by the fact that the meta-data surrounding these images is often inaccessible, of poor quality, or presented in a variety of formats. Automating the calibration process allows one to construct high quality meta-data in a standard format, enabling astronomical data to be shared and used more easily.

In broad terms, auto-calibration is a search problem similar to Google's internet search engine. There are standard references, such as the Sloan Digital Sky Survey, that provide an atlas for the stars. What's needed is a way to search through this enormous atlas to find the view presented by a given image.

Let's go behind the scenes to understand how Astrometry.net performs this search. There are really two phases. We begin with a known reference, like the Sloan Survey, and create an "index," as we'll soon describe. We only need to do this once so we don't mind committing time and computational power to this task. Next, we develop a means of searching the index to locate a specific image.

The basic idea is to detect certain features in an image and describe them in a way that is both independent of the orientation and scale of the image and easily searchable. The features we will use are configurations of four stars, called quads.

Describing quads

Given an astronomical image, it's fairly straightforward to pick out the brightest stars using standard image processing software. Using the OpenCV software library, I was able to pick out the 100 brightest stars from our original image. To make them more apparent, I reversed the image so that brighter parts of the original image appear darker here.

image with stars as black dots

We will construct a description of quads, configurations of four stars, that is independent of their orientation and scale in the image. To illustrate, here's a quad with stars labeled $A$, $B$, $C$, and $D$.

diagram with 4 labeled stars

We label the four stars so that $A$ and $B$ are separated by the greatest distance. We then use those two stars to form an orthogonal coordinate system with $A$ at the origin and $B$ at $(1,1)$. We only consider quads for which the other two stars, $C$ and $D$, are within the unit square in this coordinate system.

quad coordinate system

If $(x_C, y_C)$ and $(x_D, y_D)$ are the coordinates of the interior stars, we associate the four-dimensional point $(x_C, y_C, x_D, y_D)$ to the quad.

Of course, there's some ambiguity in how we label $A$ and $B$. Swapping the labels on $A$ and $B$ performs the transformation: $$ (x_C, y_C, x_D, y_D) \mapsto (1-x_C, 1-y_C, 1-x_D, 1-y_D). $$ We choose the labeling of $A$ and $B$ that gives $x_C + x_D \leq 1$. There are also two choices for how we label $C$ and $D$, and we choose the one with $x_C\leq x_D$. With these choices, the point $(x_C, y_C, x_D, y_D)$ is uniquely associated to the quad. For instance, the quad above is uniquely associated to $(0.36367, 0.53644, 0.61088, 0.32359)$.

This association works well for our application since it doesn't change if the quad appears in a different orientation. For instance, if the quad is translated, rotated, or scaled, we still obtain the same four-dimensional point $(x_C, y_C, x_D, y_D)$.

transformed quad transformed quad with coordinates

To create an index, Astrometry.net chooses a collection of quads from a reference, such as the Sloan Survey, that covers the entire sky. As we'll see next, representing quads as four-dimensional points allows us to easily search for specific quads. When presented with an image to be calibrated, we find a set of quads in the image and then search the index for them. When we find a match, it's straightforward to construct the coordinate transformation between the pixels of the image and the celestial sphere.

kd-Trees

Due to noise present in astronomical images, we cannot expect that the four-dimensional point associated to a quad obtained from our image will exactly match a point in the index. Images are distorted, for instance, by the particular optics of a telescope and by the refraction of Earth's atmosphere. If we're given a quad, what we'd really like to do is find all the nearby quads and consider each as a potential match.

But how can we efficiently search through all the quads to find nearby quads? We organize our four-dimensional points into a $kd$-tree. We'll illustrate by organizing the following collection of two-dimensional points into a binary search tree.

cloud of points

Any set of points sits inside a bounding box, the smallest rectangle that contains the points and whose sides are parallel to the coordinate axes. The set of all points and their bounding box constitutes the root of the tree.

point cloud in bounding box

Next, we determine the dimension along which the points are most widely separated. In our example, the points are more spread out in the horizontal direction so we divide the points into two equal sets at the median of the horizontal coordinates. The points to the left and their bounding box form the left child of the root and the points to the right form the right child.

points collected in 2 rectangles

Now, continue subdividing until each point lies in a single node. The resulting tree structure is known as a $2d$-tree due to the two-dimensional nature of the point set.

subdivision at step 3 subdivision at step 4

Suppose that we are presented with a new point $p$ and a distance $r$, and we'd like to find all the points in our original set that are within a distance $r$ of $p$.

search tree and new circled point

Given a bounding box, we can compute the minimum distance from $p$ to the bounding box.

point joined to bounding box by a diagonal line segment point joined to bounding box by a vertical line segment

To begin the search, start with the root of the tree and ask whether the minimum distance from $p$ to the root's bounding box is less than $r$. If not, there are no points in the point set within a distance $r$ of $p$, and our search concludes. If the minimum distance is less than $r$, then we ask the same question of the two children and continue the search down the tree.

search tree with closest quad highlighted in red

In this way, we eventually find all the points within a distance $r$ of $p$. In the example illustrated above, for instance, we have searched through 150 points by examining only 14 bounding boxes to find five points within the given distance.

This is an example of a $2d$-tree. Astronmetry.net constructs an index by organizing the quads present in the reference into a searchable $4d$-tree.

Verifying a match

Once we have identified a quad in our image, we search through the $4d$-tree for quads that are within a reasonable tolerance. Since the index does not contain every quad, it's possible we do not find any quads that match. However, there are plenty of quads in our image so we continue with another. Usually, there will be a large number of possible matches so we need to determine which is the best.

A quad in the index that is close to a quad in our image produces an "alignment" of the image onto the celestial sphere, as returned by my search.

Ursa Major constellation with highlighted points

With many nearby quads, how do we know which to accept? If a quad produces the correct alignment, we will have two lists of stars, one in the image and one in the reference, and a correspondence between them. Therefore, we search through a second $kd$-tree to find stars that are near to those in the reference quad and check to see if the corresponding stars are present in the image. Of course, even with the correct alignment, it's possible that stars do not appear in their expected location as they may be occluded by a planet, satellite, or comet or they may not be correctly identified as stars when the image is processed.

Due to the probabilistic nature of this question, the choice of whether to accept a proposed alignment is made using Bayesian decision theory. This enables Astrometry.net to set a high threshold for accepting a potential match, which eliminates most false positives. In tests, Astrometry.net is able to correctly recognize 99.9% of testing images with no false positives.

Summary

The central idea that leads to the impressive results achieved by Astrometry.net is the ability to identify certain features, quads, that can be expressed in a form that is easily searchable.

One can imagine other choices. For instance, using configurations of three stars, "tris," would allow us to represent features using 2-dimensional points. Our index would then squeeze all of our reference features into the unit square. Remember that noise in the images we are processing means that there is uncertainty in the coordinates of these features. Suppose, for instance, that we can only guarantee that a coordinate lies in an interval of width 0.1. The tri then lies somewhere in a square of area 0.01, which is 1% of the area of the unit square. We can detect at most 100 distinct tris.

Because quads live in 4-dimensional space, however, a quad would only occupy a volume of 0.0001 under these assumptions, which means we could distinguish 10,000 distinct quads.

Clearly, using a configuration of five stars would enable us to distinguish an even greater number of configurations. The trade-off, however, is that searching the $kd$-tree takes longer because we have more coordinates to work with. The choice to use quads is a compromise between the efficiency of the search and the number of nearby quads, and resulting proposed alignments, that need to be searched.

Astrometry.net, created by Dustin Lang, David Hogg, Keir Mierle, Michael Blanton, and Sam Roweis, enables a vast collection of images, including those made by amateur astronomers, to be used in astronomical research. What's presented here is a fairly broad overview that skims over many details of the implementation. Readers wanting to know more are encouraged to read the collaborators' paper or Lang's thesis, which presents a complete picture.

References