Feature Column

Elliptic curves come to date night

Ursula Whitcher — Mon, 01 Apr 2024 04:01:53 +0000

Elliptic curves come to date night

Ursula Whitcher
Mathematical Reviews (AMS)

In game theory, the Nash equilibrium describes a situation where neither player will benefit from changing strategies. Recently, two experts in applied geometry, İrem Portakal and Bernd Sturmfels, showed that a different notion of equilibrium has an intriguing geometric structure. Let's explore the shape of decision-making in the context of a simple cooperative game: what to do for date night.

İrem Portakal at Oberwolfach (photo used by permission from Portakal)

Bernd Sturmfels at Oberwolfach (CC BY-SA 2.0 DE DEED, cropped for display)

Board games for date night

Imagine a couple, Willa and Cara, who like to play board games on Friday night. Willa's favorite game is Wingspan, a game where players learn about different kinds of birds. Cara's favorite is the city-building game Carcassonne. Willa works from home on Fridays, while Cara spends all day on campus, so the pair has agreed that Willa will clear the table and set up a game, while Cara brings home a suitable dessert: Belgian waffle fixings for Wingspan, or delicate cookies from their favorite French bakery for Carcassonne. However, Friday mornings are busy, and sometimes the pair doesn't communicate effectively: Willa might set up Wingspan while Cara grabs cookies, or Willa might try to please her partner by setting up Carcassonne while Cara is choosing ice cream and fruit to top waffles. This isn't the end of the world, but both women agree that date night is better when everything goes according to plan.

A Wingspan accessory shaped like a bird feeder (Pongrácz Zsolt, CC BY-SA 3.0 DEED)

Willa is an economist and Cara is a mathematician, so together they have decided to turn the problem of which game to play into a separate meta-game. Because they both love numbers, Willa and Cara start by creating matrices called "payoff matrices" to quantify exactly how happy they are in each situation. The rows represent the choice of board game (Wingspan or Carcassonne), while the columns represent the night's dessert (waffles or cookies). Here's Willa's matrix:

\[ \begin{pmatrix}7 & 0 \\ 0 & 3 \end{pmatrix}\]

(Willa's zero represents "fine": Wingspan and cookies is OK, she reasons, but Wingspan and waffles is so much better.)

Cara enjoys the local bakery's cookies even when they don't match the game, so her matrix looks a little different:

\[ \begin{pmatrix}3 & 1 \\ 0 & 6 \end{pmatrix}\]

(Game theory experts may notice that Willa and Cara's date night scenario is similar to a standard example game called "Bach or Stravinsky?" The Willa and Cara example uses slightly less special payoff matrices. It's also easy to imagine playing the same board game over and over, while only a few lucky couples live in towns where they can choose between full Bach and Stravinsky concerts on a regular basis.)

Willa and Cara could solve their communication problem by always defaulting to waffles and Wingspan. This way, nobody has to make a decision, and everyone always has positive happiness. There is no incentive for Cara to cheat by bringing home cookies instead, since she knows Willa is already setting up Wingspan; similarly, there's no reason for Willa to back out of the agreement and start setting up Carcassonne when she already knows that Cara is buying waffle fixings. "Always waffles and Wingspan" is an example of a Nash equilibrium point. There's another equilibrium point for "always cookies and Carcassonne".

Although always choosing the same game and dessert combination leads to happiness in the short term, over time the partner who never gets to play her favorite game may begin to feel misused. Willa and Cara could try to address this problem using probability. Perhaps Willa reasons that $7/10$ of her possible happiness comes from setting up Wingspan, so she should do so $70\%$ of the time. Similarly, Cara notices that $1/10 + 6/10 = 7/10$ of her possible happiness comes from buying cookies, so she decides to buy cookies $70\%$ of the time. This kind of probabilistic approach is called a "mixed strategy" in the economics literature.

What happens when Willa and Cara implement their $70\%$ strategies simultaneously? We can represent the possible outcomes using a probability matrix with entries $p_{ij}$. Again, rows represent the choice of game, while columns represent the choice of dessert. We use the index 1 for Willa's favorites, and the index 2 for Cara's favorites.

\[ \begin{pmatrix}p_{11} & p_{12} \\ p_{21} & p_{22} \end{pmatrix} = \begin{pmatrix}21/100 & 49/100 \\ 9/100 & 21/100 \end{pmatrix}\]

We see $p_{12} = 49/100$. That means Willa and Cara will end up with Wingspan and cookies, nobody's favorite outcome, almost half the time!

If Willa and Cara can coordinate their random choices, other solutions emerge. For example, they could flip a coin on Friday mornings and choose Wingspan and waffles for heads but Carcassonne and cookies for tails. Here's the resulting, much simpler probability matrix:

\[ \begin{pmatrix}1/2 & 0 \\ 0 & 1/2 \end{pmatrix}\]

One way to compare the coin-flip strategy with the uncoordinated $70\%$ strategy is to multiply each probability by the corresponding entries in the payoff matrices to compute an expected payoff. For the uncoordinated strategies, we have:

\[(7\cdot\frac{21}{100} + 0 \cdot \frac{49}{100} + 0 \cdot \frac{9}{100} + 3 \cdot \frac{21}{100}) + (3\cdot\frac{21}{100} + 1 \cdot \frac{49}{100} + 0 \cdot \frac{9}{100} + 6 \cdot \frac{21}{100}) = 4.48.\]

For the coordinated coin-flip strategy, we have

\[(7\cdot\frac{1}{2} + 0 \cdot 0 + 0 \cdot 0 + 3 \cdot \frac{1}{2}) + (3\cdot\frac{1}{2} + 1 \cdot 0 + 0 \cdot 0 + 6 \cdot \frac{1}{2}) = 9.5.\]

That's a quantifiably massive increase in happiness!

A different type of equilibrium

İrem Portakal and Bernd Sturmfels are interested in a notion of game-theoretic equilibrium developed by the German philosopher Wolfgang Spohn. The idea is to use conditional expected payoff. These are the payoffs players can anticipate when they happen to make a specific choice. For example, the total probability that Willa will choose Wingspan is $p_{11} + p_{12}$. That means her conditional expected payoff from Wingspan is:

\[7\cdot \frac{p_{11}}{p_{11} + p_{12}} + 0 \cdot \frac{p_{12}}{p_{11}+ p_{12}}.\]

If we use the values for the probability matrix from the simultaneous $70\%$ strategies, that makes Willa's conditional expected payoff from Wingspan

\[7\cdot \frac{21/100}{21/100+49/100} + 0 \cdot \frac{49/100}{21/100+49/100} = 2.1.\]

Using the same values, her conditional expected payoff from Carcassonne is

\[0\cdot \frac{p_{21}}{p_{21} + p_{22}} + 3 \cdot \frac{p_{22}}{p_{21}+ p_{22}} =\]
\[0 \cdot \frac{9/100}{9/100+21/100} + 3 \cdot \frac{21/100}{9/100+21/100} = 2.1.\]

Since Willa's conditional expected payoff is the same for either game, she has no incentive to push for a probability matrix where Wingspan happens slightly more often or slightly less often.

Spohn defines a mixed strategy (for us, a probability matrix) to be a dependency equilibrium if every player has the same conditional expected payoff for each choice they can make. This style of analysis sounds a lot like the Nash equilibrium, and indeed, every Nash equilibrium point is also a dependency equilibrium. (This claim needs one caveat: it only works when the notion of dependency equilibrium is defined. Because we can't divide by zero, conditional expected payoff doesn't make sense when there are choices that a player never makes. If we want to analyze scenarios like the "always waffles and Wingspan" strategy described above, we'll need to use other methods; Spohn suggests using a limit argument.)

Nash equilibrium points are typically isolated: unless the payoff matrices are very special, they form a finite set. In contrast, there can be many dependency equilibrium points. For Willa and Cara's payoff matrices, to find all of the dependency equilibria, we need to solve the system of simultaneous equations

\[7\cdot \frac{p_{11}}{p_{11} + p_{12}} + 0 \cdot \frac{p_{12}}{p_{11}+ p_{12}} = 0\cdot \frac{p_{21}}{p_{21} + p_{22}} + 3 \cdot \frac{p_{22}}{p_{21}+ p_{22}}\]

and

\[3\cdot \frac{p_{11}}{p_{11} + p_{21}} + 0 \cdot \frac{p_{21}}{p_{11}+ p_{21}} = 1\cdot \frac{p_{12}}{p_{12} + p_{22}} + 6 \cdot \frac{p_{22}}{p_{12}+ p_{22}}\]

for the probabilities $p_{11}$, $p_{12}$, $p_{21}$, and $p_{22}$. (As usual, Willa's choices correspond to rows, while Cara's choices correspond to columns.)

Portakal and Sturmfels point out that if we clear denominators, we can transform the system to a simpler-looking matrix problem. For Willa's equation, we use the matrix

\[ M_W = \begin{pmatrix} p_{11} + p_{12} & 7 p_{11} + 0 p_{12} \\
p_{21}+ p_{22} & 0 p_{21} + 3p_{22} \end{pmatrix}. \]

For Cara's equation, we use the matrix

\[ M_C = \begin{pmatrix} p_{11} + p_{21} & 3 p_{11} + 0 p_{21} \\
p_{12}+ p_{22} & 1 p_{12}+ 6 p_{22} \end{pmatrix}. \]

The dependency equilibria are given by values of $p_{11}$, $p_{12}$, $p_{21}$, and $p_{22}$ such that $\det M_W = \det M_C = 0$. For Spohn, the algebraic intricacy of the set of all dependency equilibria is a drawback. For Portakal and Sturmfels, it's an exciting opportunity to apply the geometry of polynomials.

A geometric space for probabilities

We want to describe the set of all dependency equilibria for Willa and Cara's game night game. But before we do so, we need to describe the set they live in: the space of all possible probability matrices for a game. There are several ways to go from a probability matrix to a point in a geometric space. For our two-player, two-outcome game, a simple way to find a point in a geometric space would be to think of the four possible probabilities, $p_{11}$, $p_{12}$, $p_{21}$, and $p_{22}$, as coordinates of a vector $(p_{11}, p_{12}, p_{21}, p_{22})$ in $\mathbb{R}^4$. We know each individual probability must be greater than or equal to zero, and we have the constraint that $p_{11} + p_{12} + p_{21} + p_{22} =1$. Thus, the possible probability vectors lie inside a region in $\mathbb{R}^4$ bounded by the hyperplane $p_{11} + p_{12} + p_{21} + p_{22} =1$ and the four coordinate hyperplanes. The shape of this region is called a simplex; simplices are higher-dimensional generalizations of triangles and tetrahedra.

Portakal and Sturmfels use a different strategy. Their method is a bit more involved, but it will allow us to lean on three-dimensional geometric intuition. We start by observing that if we have a list of any four positive numbers $(a,b,c,d)$, we can convert to a legal probability four-tuple by dividing by their sum: we get $(\frac{a}{a+b+c+d},\frac{b}{a+b+c+d},\frac{c}{a+b+c+d},\frac{d}{a+b+c+d})$, which satisfies the equation

\[\frac{a}{a+b+c+d}+\frac{b}{a+b+c+d}+\frac{c}{a+b+c+d}+\frac{d}{a+b+c+d} = 1.\]

Of course, $(2a,2b,2c,2d)$ would yield the same probability four-tuple, because the 2s in the numerator and denominator would cancel. In fact, multiplying $a$, $b$, $c$, and $d$ by any positive number results in the same probability four-tuple.

This equivalence is reminiscent of a famous geometric space, the real projective space $\mathbb{R}\mathbb{P}^3$. We can build $\mathbb{R}\mathbb{P}^3$ by starting with the four-dimensional space $\mathbb{R}^4$, removing the origin $(0,0,0,0)$, and imposing the rule that scalar multiples of the remaining points are equivalent. That is, $(a,b,c,d)$ and $(ka,kb,kc,kd)$ are equivalent for any $k \in \mathbb{R} - 0$.

Thinking about the shape of all of real projective space at once can be mind-bending. But $\mathbb{R}\mathbb{P}^3$ contains a friendly copy of $\mathbb{R}^3$: as long as $d \neq 0$, we can multiply all the points by $\frac{1}{d}$, yielding a subspace containing points of the form $(a,b,c,1)$. Portakal and Sturmfels focus on probability four-tuples where all four probabilities are positive. They view these four-tuples $(p_{11}, p_{12}, p_{21}, p_{22})$ as points inside the region of $\mathbb{R}\mathbb{P}^3$ where all coordinates are positive. If we like, we can divide through by the last coordinate and view probability four-tuples as specifying points in $\mathbb{R}^3$ where all the coordinates are positive. If we do end up in a situation where we need to allow some coordinate to be zero, we can always take a limit.

The geometry of dependency equilibria

Now we're ready to visualize the dependency equilibria for Willa and Cara's game! Remember, we're considering the points where $\det M_W = \det M_C = 0$. Let's divide through by the last coordinate as we discussed above, setting $a = p_{11}/p_{22}$, $b=p_{12}/p_{22}$, and $c = p_{21}/p_{22}$. That means we're looking at solutions to the system of equations $-7ac - 4a + 3b = -2ab + bc + 3a + 6b = 0$. The result is a curve called the Spohn curve, realized as the intersection of two surfaces in $\mathbb{R}^3$.

The red and blue surfaces intersect in the Spohn curve for Willa and Cara's game.

Portakal and Sturmfels show that every $2 \times 2$ game—that is, every game with two players, each of whom has two choices— has a Spohn curve of a very special type: it's an elliptic curve. If we look at solutions to their equations over the complex numbers, elliptic curves have the topology of a torus—that is, a donut shape. For Spohn curves, we're only considering a real slice.

A mathematical torus

Elliptic curves are famous for their multitude of applications. In elliptic curve cryptography, points on specific elliptic curves are used to build systems for encrypting and decrypting messages. Loyal readers of the Feature Column may also recall that elliptic curves also appear in physics problems involving Feynman diagrams.

Maybe it's time for Willa and Cara to consider adding donuts to their date night rotation!

Photo by 5th Luna, CC BY-NC 2.0

Is this $p$-hacking?

Ursula Whitcher — Fri, 01 Mar 2024 05:01:22 +0000

The number of comparisons is going to escalate quickly. If we have four flavors of ice cream, we go from Scenario 1 showing three significant variables in its model outputs to the rest of the scenarios only reporting one...

Is this $p$-hacking?

If you have to ask, it probably is.

Sara Stoudt
Bucknell University

I got asked an amazing question last semester while I was giving a crash-course workshop on statistics (shout out to the Internship Network in the Mathematical Sciences) in the fall. It was one of those questions that I really wanted to dig into, and now I finally am.

The question had to do with $p$-hacking, which is a concept that comes up a lot when thinking about the reproducibility and replicability of statistical results. There has been plenty of discourse about the concept, but the gist is that you keep digging around in the data until you find something significant. By doing this, we run the risk of making false discoveries that wouldn’t be able to be replicated in a different study. We trade long term progress for short term gains.

The context of this question was regression and how to treat a categorical variable with more than two categories as a covariate in a model. Before I reveal the specific question, let’s make sure we’re on the same page with the statistical setting.

We’ll start with a concrete scenario. Suppose I want to model ice cream consumption and one covariate is what flavor is on sale (vanilla, chocolate, or strawberry) and one covariate is the temperature outside. Note: in this fictional world, one and only one ice cream flavor can be on sale at any given time.

Ice cream bowl by Twemoji, CC BY 4.0 Deed; Strawberry by anavrin-stock, CC BY-NC-ND 3.0 Deed; thermometer by HitomiAkane, CC BY-SA 4.0 Deed; vanilla and chocolate are public domain.

Now we could assign different numbers to each sale flavor: 1 for vanilla, 2 for chocolate, and 3 for strawberry. But that feels unsatisfying because it’s not like chocolate being on sale counts for twice as much as vanilla being on sale. What if we break up this one categorical variable into multiple binary categorical variables instead? So we make the so-called “dummy” variable isVanilla and set it equal to 1 if vanilla is on sale or 0 if it isn’t. Then isChocolate is 1 if chocolate is on sale or 0 if it isn’t. That leaves strawberry as what we call the baseline: if isVanilla and isChocolate are both zero, that means we’re looking at a situation where strawberry is on sale.

Now when we fit this model, we’ll get coefficients on both isVanilla and isChocolate, and these will tell us how these flavors being on sale are associated with ice cream consumption in comparison to how strawberry being on sale is associated with ice cream consumption.

If the coefficient on isVanilla is -1.2 and the one on isChocolate is 0.8, that means that if the ice cream flavor that is on sale is vanilla, we expect to see ice cream consumption decrease by 1.2 units on average compared to what ice cream consumption would be if strawberry was on sale, holding all else constant. Phew, that’s a mouthful! Similarly, if the ice cream flavor that is on sale is chocolate, we expect to see ice cream consumption increase by 0.8 units on average compared to what ice cream consumption would be if strawberry was on sale, holding all else constant.

Now, here’s where it gets weird. Each of those coefficients has a level of significance. But what statistical test is being performed? Each coefficient’s inference values actually come from a fancy difference in means test between strawberry and vanilla and strawberry and chocolate respectively.

We’re finally to the part where I can tell you the original question that inspired this blog post. Are you ready? The question was...

Can you $p$-hack by fiddling with the baseline level in a model? We all want our model to be significant, right?!

Now, $p$-hacking is really what statisticians call a “multiple testing” problem. (xkcd explains the issue here too.) One of the things that a $p$-value cut off of 0.05 implies is that the probability of a false positive, you rejecting the null hypothesis when you shouldn’t, is only 5%. That seems pretty unlikely. But what happens if you start doing more tests? The probability that at least one unlikely thing happening across many, many tests, turns out to be, well, not that unlikely anymore. By the time we run ten tests, we already have over a 40% chance of having at least one false positive.

So for instance, suppose, I’m a bit of a chocoholic and I choose chocolate as the baseline.

Is it possible for the significance of the coefficients on isVanilla and isStrawberry to not match the significance of the coefficients on isVanilla and isChocolate in the other model?

Let’s sketch it out! Below we have some hypothetical relationships between the flavors and the response variable, complete with a confidence interval. Each interval represents one of the flavors: vanilla, chocolate, or strawberry. Roughly, if the intervals don’t overlap, the significance test comparing them would turn up significant. In this example, two comparisons are significant, while one isn’t. If the baseline is chosen as in Scenario 1, the model results are going to have two significant coefficients while in the other two scenarios, there will only be one significant coefficient.

Now with three levels of the categorical variable, things can’t get too wild. The difference between reporting one and two significant variables doesn’t sound too sneaky if we’re thinking in the context of $p$-hacking. But the number of comparisons is going to escalate quickly. If we have four flavors of ice cream, we go from Scenario 1 showing three significant variables in its model outputs to the rest of the scenarios only reporting one. That seems a bit misleading.

But what if we had a categorical variable with more levels? For example, the Census breaks income into 9 brackets (for example, see Table A2 here). Can you draw a picture of 9 intervals where a strategic choice of baseline results in an even more dramatic discrepancy in significance output? Similarly, what if we started adding interaction terms between our dummy variables and the temperature variable? There is a lot of room for things to get weird.

So what’s the take-away from this tale? Choosing a baseline often happens by default, but it can have an impact on the results we see. Even though messing around with the baseline may not seem as pernicious as some other more egregious ways of $p$-hacking, thinking about this scenario can make us more aware of the impact that our modeling choices can have.

Impossible?

Ursula Whitcher — Thu, 01 Feb 2024 05:01:57 +0000

Mathematics helps develop definitions to compare different means to make something better or more fair. But there are inherent limitations, expressed as mathematical impossibility theorems...

Impossible?

Joe Malkevitch
York College (CUNY)

Introduction

When you got up from sleeping last night, what were the possibilities that you could achieve in the new day? This is far from the issue of if humans have free will, or whether their choices are constrained by events, genetics or economics that limit their freedom. Humans are not genies who in mythology have the power to make a person's wishes come true. What is possible for you and what is not? Here, I am interested in the limits that mathematics places on what we can know using mathematics as a tool. What are the consequences of mathematical impossibilities for the way that democracies (or autocracies, for that matter) can be run for the benefit of their citizens? I'll say more on the fairness issues after setting the stage for insights into the limits of mathematical insights.

Impossibility in mathematics

People debate whether mathematics is discovered or invented. The first framework is that the facts of mathematics, often described as theorems, are out there and await some clever person (or AI system?) to find them. That mathematics is an invented framework suggests that until a person (or perhaps a computer?) uses their reasoning skills and creativity to invent mathematical ideas and concepts, in the same way that humans have invented devices such as the cell phone, washing machine or electric automobile, those ideas do not exist. Lurking in the background of this debate is the question of what can be achieved and what cannot in understanding the "world of mathematics." After some initial looks at the limits of mathematics, I want to discuss the limits that mathematics places on the ways individuals or groups of individuals can implement being fair!

The remarkable fact often known as the Pythagorean Theorem has been thought about by many cultures. In modern notation, the Pythagorean Theorem states that for a right angle triangle, in what is commonly called the Euclidean Plane, where $a$ and $b$ are the lengths of the sides that meet at a right angle, and $c$ is the length of the side of the triangle opposite the right angle that

$$a^2 + b^2 = c^2.$$

It is worth noting how many ideas and conventions are needed to write down this theorem. The result was stated in English rather than (for example) Bengali and the symbols $+$ and $=$ are used in writing the theorem down. We need to know what the Euclidean Plane is, what a triangle in the Euclidean plane is, what is meant by the length of the side of a triangle, etc. We also need to know what an angle is, what a right angle is, that $7^2$ means $7 \times 7$ and thus has the value 49 expressed in decimal place notation arithmetic, etc.

Part of the reason the Pythagorean Theorem is such a rich fact is that it can also be interpreted as a statement about areas rather than about lengths. Thus, for a right triangle the areas of the squares on the sides $a$ and $b$ of the triangle add up to the area of the square on the third side $c$, often called the hypotenuse of the triangle, using language derived from discussions of ancient Greek mathematics.

The Pythagorean Theorem illustrated. (Diagram courtesy of Wikipedia)

The richness of the equation $a^2 + b^2 = c^2$ as a source of ideas and questions, and the fact that it was in essence found by many cultures who did not have much contact with each other, has been used to argue that the equation and its ramifications were mathematics waiting to be discovered rather than invented.

Once mathematicians get started with a nifty idea, they continue to mine this territory for interesting patterns. For example:

What positive integer triples $(a, b, c)$ satisfy $a^2 + b^2 = c^2$?
Are there triples $(a, b, c)$ that form an arithmetic progression and satisfy $a^2 + b^2 = c^2$?
Are there pairs $(a, b)$ and choices for $c$ where exactly 7 choices for the pair $(a, b)$ satisfying $a^2 + b^2 = c^2$ are possible?
What is the analog of the Pythagorean theorem in geometries other than the Euclidean plane (e.g., Euclidean 3-dimensional geometry, or what has come to be known as hyperbolic geometry, also known as the Bolyai-Lobachevsky plane)?
Invent your own question to place here!

In developing ideas about the concept of numbers to count and to measure it was noticed that something seemed to be special about the length of the hypotenuse of a right triangle whose other sides (its legs) are 1 and 1. Equation $a^2 + b^2 = c^2$ implies that this length is a number whose square is 2. We usually denote this number $\sqrt{2}$ and it is referred to as the square root of 2.

Can this number be written in the form $a/b$ where $a$ and $b$ are positive integers? This collection of numbers is now known as the rational numbers. So the question we raise is whether there are numbers which are not rational and how to describe such numbers. It can be shown that it is impossible to represent the hypotenuse of some particular Euclidean triangles as rational numbers. (Some right triangles can have rational length hypotenuses: for example, triangles with legs of lengths 3 and 5 or 5 and 12, which have, in fact, integer length.)

Early on it was shown that humans could demonstrate the impossibility of writing the square root of 2 as a rational number. It is possible to find rational numbers, where the representation $a/b$ gives a very good approximation to the square root of 2. For example 41/29 or 99/70 might be used as an approximation.

Compass and straight edge construction impossibilities

Euclid's important book on geometry, the Elements, includes discussions of what lengths of segments, as well as other constructions can be accomplished using a straight edge (a ruler without length markings) and compass (allowing one to draw a circle with a particular center and radius). Three famous problems emerged out of these discussions:

Trisecting any angle (Given an arbitrary angle, can one obtain an angle whose measure is $\frac{1}{3}$ the angle of the measure we started with?)
Duplicating the cube (Finding the edge length of a cube whose volume was exactly double the volume of a cube with side length 1 (and hence, volume 1.)
Note: To this day, despite many proofs that one cannot trisect arbitrary angles with a straight edge and compass, many people claim that they can accomplish this feat. These individuals have not been convinced that this task is impossible!
Squaring the circle (Find a square whose area is equal to the area of a given circle.)

Eventually, it was proven that it is impossible to carry out these constructions with straight edge and compass. Of course, one has to specify what constitutes a proof. Intuitively, a proof is a collection of logical steps based on known facts (theorems) or axioms (statements whose validity is assumed true) which lead to the desired statement (in a finite number of steps). The understanding of what constitutes a rigorous proof has changed with time. Today, computer systems are being used to "make sure" that certain theorems are really facts. The reason for this recent thread of mathematical work has been because the proofs humans have in some cases provided for mathematical conjectures have become so long and complicated that it is unclear if they can be checked for certain by other mathematicians! Some important mathematical facts have required proofs that go on for hundreds of pages. Making sure the proofs do not suffer from subtle errors is required. Sometimes theorems generally accepted by the mathematical community are shown not to apply as fully as thought, and sometimes new proofs have had to be provided because an existing proof was shown to have a mistake, often because some case that might a priori occur was not considered in the proof.

In high school you probably learned a formula which, given the coefficients of a quadratic equation, enabled you to write down an algebraic expression that gives the roots of that quadratic equation. Over time mathematicians developed similar "closed form" formulas for polynomial equations with integer coefficients of degree 3 or 4 (the cubic and quartic equations). However, it came as a surprise when Évariste Galois (1811-1832), Niels Heinrich Abel (1802-1829) and Paolo Ruffini (1765-1822) developed tools that showed that for the analogous polynomial equations of degree 5 or higher it was not possible to find such a formula! Similarly, if you have studied some calculus you know that when functions can be expressed in terms of polynomials, logarithms, exponential functions and trigonometric functions, it is usually not that difficult to find the derivative of such a function. But for relatively simple examples (e.g., $\sqrt{\sin(x)}$) it is not possible to find the indefinite integral of such a function without using tools specially invented to describe the solutions of such integrals, (e.g., elliptic functions).

Condensing the issues considerably, a pioneer in trying to understand what mathematics was knowable from formal systems where one had a collection of undefined terms and axioms involved was David Hilbert (1862-1943).

Photo of David Hilbert (Courtesy of Wikipedia)

Reacting to programs developed by Hilbert (1862-1943), the mathematician Kurt Gödel (1906-1978) totally surprised the mathematical community by showing that there are legal statements that in a formal mathematical system cannot be proven or disproved. Such statements are sometimes called undecidable.

Photo of Kurt Gödel (Courtesy of Wikipedia)

This result of Gödel is often labeled his Impossibility Theorem. Others after Gödel, notably Julia Robinson (1919-1985) (who was a President of the American Mathematical Society) and Yuri Matseyovich showed that long standing questions that interested the mathematics community were undecidable.

Photo of Julia Robinson (Courtesy of Wikipedia)

Photo of Yuri Matseyovich (Courtesy of Wikipedia)

After many years of the study of impossibility and undecidability issues, examples of statements which are undecidable have been found in many subareas of mathematics ranging from geometry to group theory (algebra) and to combinatorics. In a general way, however, these impossibility questions belong to the domain of mathematics known as logic.

Impossibility in computer science

As a scholarly discipline, computer science is much newer than mathematics. Being able to study for a degree in computer science at a college first became possible relatively recently. It appears that Purdue University was the first American university to form a department of computer science in 1962. Some computer scientists are primarily concerned with making the hardware that makes it possible to solve problems using that hardware. Other computer scientists are interested in designing languages for computers to solve problems using particular hardware, as well as those scholars who are concerned with the algorithms that can be used to solve problems on a computer. A pioneer of computers in both the sense of what they could do and making hardware to actually do the work was Alan Turing (1912-1954). Today there is a vast array of complexity classes which take questions and lump them together in terms of how hard it is to solve these problems using algorithms.

Photo of Alan Turing (Courtesy of Wikipedia)

Democracies

In a representative democracy, the goal is to "translate" the wants and desires of the people being governed using a legislature or parliament to carry out the will of the people. When democracies work well, while majority opinion is implemented, the interests and rights of minorities are maintained. The system used by the United States, having a strong President to execute the laws passed by the Congress and a bicameral legislature, where the President is elected directly by the people and the members of Congress are elected from districts in the individual states as legislators in the House of Representatives and two senators are elected for each state, is unusual. The system common in most European democracies has a single parliament (legislature) where the representatives are selected based on the percentage of the vote that competing parties get. The parties represent the range of views of the populace. The aim is that for a parliament with $h$ seats (house size, a positive integer), if party A gets 17% of the vote it would get about 17% of the seats. The Dutch parliament has 150 seats so in this example we might calculate $.17(150) = 23.5$ seats and conclude party A "deserves" 23.5 representatives. So should party A get 23 seats, 24, or perhaps 25 seats and would this be considered a fair apportionment of seats to party A? In recent years many European democracies have been using a hybrid system to elect legislators, where a combination of percentage vote for parties and electing representatives from districts is used.

To get the major ideas across here I will confine myself to discussing the election of, say, a mayor as the chief executive officer of a small city where any eligible voter can vote for the Mayor. In principle there may be many candidates running for Mayor, where perhaps each candidate is linked to an affiliation to some party, though for this discussion the issue of party is not relevant. To model what is going on we will assume there are $n$ eligible voters who actually voted, and $c$ candidates (here $c$ is a positive integer at least 2).

In an election, each voter uses a ballot, which is a way to express voter views about the candidates. Based on the ballots cast, a prespecified decision method, which I will refer to as an election method, is used to "translate" the views of the individual voters into a SINGLE choice (no ties) for who becomes mayor. This may sound simple but when trying to design an "ideal" system there is little agreement! To see the complications, consider the issue of what kind of ballot to use. In practice, and in theory, here are some examples of the kinds of ballots used in America in the past or proposed for use. Of course, the nature of the election method will have to be tailored to the kind of ballot that is used. For a specific type of ballot there are usually many choices as to what election method to use based on the ballots.

Ballot types

(Plurality ballot) Vote for exactly one of the $c$ candidates.
Rank all the candidates without ties, from most preferred to least preferred.

There are many ways to represent such a ballot, but let me offer two. The first representation uses a diagram with the most preferred candidates towards the top:

This ballot means that of the three candidates this voter preferred B to C and to A, and C to A. Note that "how much," or with what intensity B is preferred to B, or C preferred to A, is not information available to those who must count the ballots in this system. Another way to represent this ballot, using the symbol > to mean preferred (for numbers this is the symbol used for greater than), is:

$$B > C > A.$$

Ballots that allow one to express an order of preference but not intensity are sometime called ordinal ballots, to contrast them with cardinal ballots where some system to express intensity of preference is offered. Note that some words of explanation are necessary for most people to understand how to read the meaning of the ballot. Perhaps without words a person will understand that the first notation system ranks candidate B above C and A. Some people might find the second notation less clear because they are not familiar with the greater than symbol.

Rank all the candidates with ties, from most preferred to least preferred.

Here is a way to represent a ballot for 5 candidates in this framework.

$$A > D = E > C > B.$$

The equal sign is used to indicate ties in preference.

Note that while D and E are liked equally the ballot does not allow the voter to express that E is preferred to C by a small amount but that the voter likes C a lot more than B.

Rank some subset of the candidates with or without ties allowed.

This kind of ballot allows a voter to purposely truncate (omit) the collection of candidates the voter chooses to rank. In the example below, where A, B, and C are the choices seeking office, a voter has ranked only A and C, and B does not appear on the ballot.

Why might a voter not list all of the candidates on the ballot? It might be because the voter has no information about some of the candidates and hence does not feel comfortable listing some candidates. However, another reason may have to do with the fact that the voter has knowledge about the election method which will be used to decide the election. The voter might predict, while sincerely preferring B to C, that listing B might result in B being elected rather than the voter's preferred choice A. Putting it baldly, the voter "lies" about what the voter truly feels and produces an insincere ballot. This approach to voting by one or more voters is often called strategic voting. Strategic voting might be more widely practiced if there are polls which show how other voters in the electorate are leaning. This additional information might convince some voter or group of voters to vote using something other than one's sincere preferences. A remarkable impossibility theorem shows that strategic voting is advantageous!

The voter indicates which candidates are "approved."

What is meant by approving a candidate? The Board of Elections presumably indicates instructions for the meaning of this term and based on the ballots, what decision method will be used to find a winner. One way to think of giving a candidate an approval vote is that one is willing to have this person serve if elected. To the extent that an approval ballot allows a voter to express the amount that the voter likes the candidates it is very "crude" in that it treats all of the approved candidates equally. It is similar to a preference ballot with truncation where all of the candidates that are not omitted are tied at the same level of intensity.

For each candidate the voter indicates a vote of yes or no for that candidate.

Again, the voter might be allowed not to provide a yes/no vote for some subset of the candidates. If the voter is required to vote yes or no on every candidate, this system is identical to approval voting.

How might one design choice and voting systems where the ballot allows for voters to indicate the intensity of their feeling about the candidate? One way of thinking of such a ballot is that each voter gives a "grade" to each choice (candidate) in much the same way that teachers give grades to students in school. What are some of the possible grading or intensity of preference scales? Here are some examples which perhaps you have used in the past or are knowledgeable about.

A, B, C, D, F
A+, A, A-, B+, B, B-, C+, C, C-, D+, D, D-, F
0 to 100 (100 high)
0 to 100 (100 low)
1 to 99 (99 high)
1, 2, 3, 4, 5 (5 high)
0,1, 2, 3, 4, 5, 6, 7, 8, 9, 10 (10 high)
Very poor, poor, neutral, good, very good

Note: There are many variants of verbal scales that solicit voters' (choosers') feelings about choices.

Cardinal ballots can be thought of as offering each voter the opportunity to express an intensity of opinion by giving a numerical grade or value (or using a verbal description) to the candidates. If on the 0 to 100 scale (higher scores are better) Heather gives Mary a score of 90, John a score of 84 and Susan a score of 45, then one can conclude that Mary is preferred to John is preferred to Susan. Mary is above John by 6 points and John is above Susan by 39 points. If Constance gives Mary, John and Susan the score 84, we can conclude that Constance is indifferent to the three choices, but can one conclude that Heather likes Mary twice as much as she likes Susan? Do equal differences in assigned scores mean something? Can voters meaningfully assign a 45 rather than a 44 to a particular candidate?

Most of us are used to the fact that temperatures can be reported by using either the Fahrenheit scale or the Centigrade scale. What makes these temperature scales different from the scales listed for "grading" candidates is that there is a fixed way to get from the number on one scale to the other—water at sea level boils at 100 Centigrade, 212 Fahrenheit. If one wants to change from Centigrade to Fahrenheit then $F = 9/5(C) +32$ is the standard way to do this. Few people agree on how to convert the grade of B+ on an mathematics writing project to a score on the scale 0-100. Scholars intensely debate the tradeoffs between using ordinal ballots and cardinal ballots as well as the pros and cons of different kinds of these ballots.

Impossibility in achieving "fairness"

Perhaps after this discussion you have a new respect for the complexity of making a democracy work. A community of scholars, ranging from those who were trained as philosophers, mathematicians, computer scientists, economists, political scientists, etc. have contributed further insights into the complications of running a democracy. Let me begin with the mathematical economist Kenneth Arrow (1921-2017).

Photo of Kenneth Arrow (Courtesy of Wikipedia)

While Arrow was a tremendously important figure in election theory and mathematical economics, his perhaps best-known work stems from his book Social Choice and Individual Values (1951, second edition 1970). Some of this work is already present in his doctoral dissertation in economics written (1951) under the direction of Harold Hotelling, (1895-1973) a mathematical statistician at Columbia University.

Arrow proved that it is IMPOSSIBLE to design an election method that meets a small list of fairness rules. His initial result had a small technical error, and over the years the fairness rules on Arrow's list have been reformulated and expanded in various ways, but the essence of what he showed, because it is a theorem rather than a theory, has to be lived with. The original framework of Arrow was an election where rather than choosing a single winner, what was required was a ranking for society (allowing ties) based on the input ballots of the individuals making up the society. Here are some samples of the kind of fairness rules that Arrow looked at. Typically, what is involved is consistency in what should happen in two closely related sets of ballots, the result in two similar elections.

There is no dictator. The choice for society does not always coincide with the choice of a particular voter - the dictator.
There is no imposed choice. The ranking (winner) depends on the votes of the voters rather than being selected on the basis of some expert or wise person (oracle).
Monotonicity. Getting more votes should not harm a candidate.
Any set of ballots should be assigned a "winner." Those counting the votes based on ballots cannot reject some election results as being too "weird" to be acted upon.
Independence of irrelevant alternative. The relative position of two candidates in the society ranking should depend only on the rank of these two candidates and not the presence of other choices.

Intuitively, one can state that Arrow's Impossibility Theorem means that for elections (or economic decision making) where there are 3 or more candidates (choices) using ordinal ranked ballots with ties allowed (system 2 described above), there are no election decision methods which are "perfect" in the sense that they obey a short list of fairness conditions.

In addition to Arrow's Theorem there is another in some ways more dramatic impossibility theorem. This result is known as the Gibbard-Satterthwaite Theorem named for the philosopher Alan Gibbard and the economist (who teaches at a business school) Mark Satterthwaite. What they independently showed is that in very general systems of voting where there are 3 or more choices (candidates) to decide among, the only election decision method that avoids the incentive for voters to use strategic voting (submit ballots that do not represent what they truly believe, that is lie about their preferences) is dictatorship.

There are by now many different impossibility theorems of this kind, where the properties that the election decision method obeys differ in the differing results but the theorems have in common that they show that if certain fairness rules are important in your view, there is no method that obeys all of the desirable conditions. Part of the reason that attempts to reform or improve election methods fail is that reformers typically differ on what fairness conditions are the essential ones. Another complexity is that some appealing election decision methods are computationally hard. For elections with many voters and candidates no computer can decide the winner/ranking in a reasonable amount of time. Experts also differ on whether the voters will accept a voting method to replace plurality if the description of the system is very complex to explain.

To help give you a sense of these matters using a specific example, let's consider the election below where 55 people have ranked 5 candidates, where there are no ties or truncations.

Take a moment to decide who you think should win this election.

Now verify that these five election decision methods choose a different winner!

Plurality (Candidate with most first place votes wins.)
Run-off (A run-off between the two persons with highest number of first places votes is the winner). The voters do not have to go to the polls to vote again because the ballots code the information about their preferences about all the choices. Of course, if the voters went to the polls a second time their preferences might change but typically many fewer people vote when there is a physical run-off election needed at a later date than the original vote.
Sequential run-off (If no candidate has a majority eliminate the candidate with the lowest number of first place votes, transferring votes for the eliminated candidates to the next highest ranked person, until a single winner emerges.)
Borda Count (Candidates are assigned points from a ballot based on the number of candidates below the particular candidate and the candidate with the most points wins. Using the ballots above B gets $18(0) + 12(4) + 10(3) + 9(1) + 4(3) +2(1)$ points.)
Condorcet (If there is a candidate who beats every other candidate in a two-way race that candidate wins. There are ballots for which no candidate meets the criterion.)

The example above shows that for some elections there are many appealing methods which do not agree as to who should win!

While the general public widely underestimates the role that mathematics plays in the technologies (e.g. cell phone, streaming video) that it enjoys so much, it is also not commonly understood that mathematics pays so much attention to questions about fairness and equity. Unfortunately, it is also true that though mathematics and computer science have shown some of the issues which make it hard to design fair systems for the benefit of society, the public has not taken advantage of designing ways to DO BETTER than what we currently do even if these better systems have faults. Thus, most scholars believe that using election systems that are based on ordinal preference ballots is superior to voting using a plurality ballot, but very few elections are conducted using ordinal preference ballots. Since impossibility theorems show that no perfect system is available, reformers constantly argue about what method to move to, and when a change is made, sometimes an election where the results seem "unintuitive" causes the reform to be undone or discourages reforms in other places.

There are impossibility theorems that affect how fully fair a system of assigning parties a fair share of seats can be in a legislature with $h$ seats. This theorem is known as the Balinski-Young Theorem, after the mathematicians Michel Balinski (1933-2019) and H. Peyton Young. Here is an intuitive statement of this theorem.

(Balinski-Young) There exists no method to assign the $h$ seats in a parliament based on the percentage of the votes for each party that is:

Population monotone (Having a larger percentage vote for a party does not decrease the number of seats the party gets.)
Obeys quota (The number of seats that a party gets is its fraction of the vote times the house size, say $s$, rounded (if not precisely a positive integer) down to the integer below $s$ or rounded up to the integer above $s$.)

The reason for the term population monotone above is related to a version of the apportionment problem for a legislature that arises in the United States. Each state has a certain population. The number of seats assigned to each of the states (currently 50) every 10 years based on census data is required by the US Constitution to be proportional to the population of the state. (A complication in the US version of the problem is that each state is required to get at least one seat in the House of Representatives.) A method used in the past allowed a state to lose representation (based on the same population data for the states) when the total house size $h$ increased. This phenomenon came to be known as the Alabama Paradox, named for the state it affected.

Algorithmic fairness

The rapid growth of the availability and power of computers has spurred the development of artificial intelligence methods. Increasingly, decisions about who should get bail after being arrested for a crime, who should get a loan for a house they want to buy, or who should get access to affordable housing are being carried out by computer programs. Some of these programs are "trained" to make decisions based on data about the success of decisions made in the past by humans. However, it has become apparent that some of the systems trained on data that was biased to begin with exhibit the same biases that originally occurred when the decisions were made by biased humans (see the July 2020 Feature Column for one example). Researchers are beginning to explore axioms describing fairness in this context. Extrapolating from voting theory, one might expect to obtain impossibility theorems showing that no matter how an algorithm is trained, there are axioms one might wish to hold that cannot be met simultaneously.

In conclusion, in democratic societies we pursue ways to make these systems work better or more fairly. Mathematics helps develop definitions to compare different means to make something better or more fair. But there are inherent limitations, expressed as mathematical impossibility theorems, that show the limits of how much it is possible, no matter how well meaning a society may be, to be totally fair.

References

Balinski, Michel L., and H. Peyton Young. Fair representation: meeting the ideal of one man, one vote. Rowman & Littlefield, 2010.

Bera, Suman, Deeparnab Chakrabarty, Nicolas Flores, and Maryam Negahbani. "Fair algorithms for clustering." Advances in Neural Information Processing Systems 32 (2019).

Brcic, Mario, and Roman V. Yampolskiy. "Impossibility Results in AI: a survey." ACM Computing Surveys 56, no. 1 (2023): 1-24.

Campbell, Donald E., and Jerry S. Kelly. "Impossibility theorems in the Arrovian framework." Handbook of social choice and welfare 1 (2002): 35-94.

del Vado Vírseda, Rafael. "From the mathematical impossibility results of the high school curriculum to theoretical computer science." In Proceedings of the 20th Koli Calling International Conference on Computing Education Research, pp. 1-5. 2020.

del Vado Vírseda, Rafael. "Learning Theoretical Computing from the Mathematical Impossibility Results of the CS Curriculum." In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education, pp. 521-522. 2020.

Dudley, Underwood. The Trisectors. Vol. 16. Cambridge University Press, 1994.

Dudley, Underwood. "What to do when the trisector comes." The Mathematical Intelligencer 5, no. 1 (1983): 20-25.

Dudley, Underwood. Numerology, or, what Pythagoras wrought. Cambridge University Press, 1997.

Dudley, Underwood. Mathematical cranks. Vol. 4. American Mathematical Soc., 2019.

Elzayn, Hadi, Shahin Jabbari, Christopher Jung, Michael Kearns, Seth Neel, Aaron Roth, and Zachary Schutzman. "Fair algorithms for learning in allocation problems." In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 170-179. 2019.

Ferejohn, John A., and David M. Grether. "Some new impossibility theorems." Public Choice (1977): 35-42.

Gaines, Brian J., and Jeffery A. Jenkins. "Apportionment matters: Fair representation in the US house and electoral college." Perspectives on Politics 7, no. 4 (2009): 849-857.

Geanakoplos, John. "Three brief proofs of Arrow’s impossibility theorem." Economic Theory 26, no. 1 (2005): 211-215.

Grofman, Bernard. "Fair apportionment and the Banzhaf index." The American Mathematical Monthly 88, no. 1 (1981): 1-5.

Hellman, Deborah. "Measuring algorithmic fairness." Virginia Law Review 106, no. 4 (2020): 811-866.

Hoare, C. Antony R., and Donald C. S. Allison. "Incomputability." ACM Computing Surveys (CSUR) 4, no. 3 (1972): 169-178.

Karaali, Gizem, and Lily S. Khadjavi, eds. Mathematics for social justice: Resources for the college classroom. Vol. 60. American Mathematical Soc., 2019.

Karaali, Gizem, and Lily S. Khadjavi, eds. Mathematics for Social Justice: Focusing on Quantitative Reasoning and Statistics. Vol. 66. American Mathematical Society, 2021.

Kelly, Jerry S. Arrow impossibility theorems. Academic Press, 2014.

Man, Priscilla TY, and Shino Takayama. "A unifying impossibility theorem." Economic Theory 54 (2013): 249-271.

Maskin, Eric, and Amartya Sen. The Arrow impossibility theorem. Columbia University Press, 2014.

Maskin, E. Arrow’s Theorem, May’s Axioms and the Borda Count. Harvard University Working Paper, 2020.

Misiurewicz, Michal. "Irrational Square Roots." College Mathematics Journal 44, no. 1 (2013): 53-55.

Niven, Ivan. Irrational numbers. No. 11. Cambridge University Press, 2005.

Nygaard, P. H. "Irrational Roots of Integers." School Science and Mathematics 64, no. 8 (1964): 694-696.

Pessach, Dana, and Erez Shmueli. "Algorithmic fairness." In Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook, pp. 867-886. Cham: Springer International Publishing, 2023.

Salles, Maurice. "Limited rights as partial veto and Sen’s impossibility theorem." Rational Choice and Social Welfare: Theory and Applications Essays in Honor of Kotaro Suzumura (2008): 11-23.

Saari, Donald G. "A dictionary for voting paradoxes." Journal of Economic Theory 48, no. 2 (1989): 443-475.

Saari, Donald G. Geometry of voting. Vol. 3. Springer Science & Business Media, 2012.

Sen, Amartya. Collective choice and social welfare. Harvard University Press, 2018.

Tang, Pingzhong, and Fangzhen Lin. "Computer-aided proofs of Arrow's and other impossibility theorems." Artificial Intelligence 173, no. 11 (2009): 1041-1053.

Wang, Xiaomeng, Yishi Zhang, and Ruilin Zhu. "A brief review on algorithmic fairness." Management System Engineering 1, no. 1 (2022): 7.

Yates, R. C. (1942). The trisection problem. National Mathematics Magazine, 16(4), 171-182.

Daniel Ellsberg and the Science of Extortion

Ursula Whitcher — Mon, 01 Jan 2024 05:01:45 +0000

I don't believe that responsible people should indulge in anything that can be even remotely considered ultimatums or threats. That is not the way to reach peaceful solutions.—President Eisenhower, July 8, 1959.

Daniel Ellsberg and the science of extortion

Bill Casselman
University of British Columbia

Daniel Ellsberg died just last summer, on June 16, 2023. It was an occasion for obituaries. He was famous because in the late 1960s and early 1970s he made copies of thousands of secret, classified papers (the "Pentagon Papers") documenting the catastrophic failure of the American prosecution of the war in Vietnam, and then released them to public scrutiny. This led eventually, by a remarkable and circuitous route, to the resignation of Richard Nixon as President of the United States. Even more remarkably, it did not lead to a criminal conviction of Ellsberg.

Daniel Ellsberg at a press conference in 1972

The obituaries didn't say much about Ellsberg's early career, but in fact it was quite interesting—in the twilight zone, you might say, where economics, mathematics, and even military strategy overlap. Many obituaries did say that he had been involved with game theory, the mathematical child of John von Neumann and Oskar Morgenstern. This was presumably because of his employment at the RAND Corporation, a research institute funded by the U.S. Air Force that was well known for its investigations into, and publications about, applications of game theory. But Ellsberg's involvement with the technical aspects of game theory was in fact extremely weak. Instead, like his sometime colleague Thomas Schelling, he was more interested in what the mathematics could suggest, rather than compute. They were both interested in the general question: How do people in a conflict make decisions, of an economic nature or otherwise? In particular, what useful theory can one propose about how people deal with extortionate threats? At that time, the most prominent threats involved nuclear warfare, but there were others of great interest.

Schelling had published in 1956 An essay on bargaining, one of the most influential economics papers of the twentieth century, and went on to be awarded the 2005 Nobel Memorial Prize in Economic Sciences. Ellsberg's contributions to economics were never so significant, but his own research had much in common with Schelling's, and was largely independent of it. But whereas Schelling went on to a productive if more conventional academic career, Ellsberg pursued access to power.

It will be useful to keep in mind the following timeline:

1952: Ellsberg graduates A. B. summa cum laude in economics from Harvard College. Nominated for a Junior Fellowship at Harvard.
1952-53: But instead attends Cambridge University for one year.
1953-54: Declines again to pursue a Junior Fellowship, and instead enrolls as an officer candidate in the U.S. Marine Corps. Serves for three years, eventually becoming a company commander.
1957: Leaves the Marines, takes up his Junior Fellowship. Attends summer school course in the mathematics of probability at Stanford. Visits the RAND Corporation informally, invited to return officially as consultant the following summer.
Summer 1958: Spends the summer at RAND, where he works on problems of national defense. Lectures on threats in economics, politics, and war as a kind of bargaining game.
1958-9: Returns to Harvard, gives in March 1959 six public lectures on the same topic at the Lowell Institute in Boston, titled The art of coercion. Two of these were repeated in a Harvard seminar run by Henry Kissinger, who was at that time on the Harvard faculty. It has been said that many of Ellsberg's ideas were later translated into action by Kissinger when he became Secretary of State—as extortionist, rather than as victim.
Summer 1959: Ellsberg moves to RAND, where he worked off and on for many years. (It was because of his position there that he had access to the papers that he leaked.)
1961-64: Consultant to the Department of Defense under a contract with RAND.
1962: Research for his Ph.D. thesis in economics at Harvard seems to have been finished quite early, but he didn't finish writing it up until 1962, while at RAND.
1964: Goes to work as special assistant to John McNaughton, Assistant Secretary of Defense (International Security Affairs), who in turn worked under Robert McNamara (known by President Johnson, who inherited him from Kennedy, as 'the man with the Stacomb').

Almost everyone who encountered Ellsberg during this period seems to have been impressed by his intelligence. His writings in this period fall roughly into three categories: (a) his senior thesis, (b) the Lowell lectures and work at RAND, (c) his Ph. D. thesis. The first and last are concerned with a topic only a dedicated economist could love, but they played a role, if only implicit, in all his early work. Ellsberg's notes for the Lowell lectures are extant and available for public reading. Although unfortunately in an unfinished state, they are both amusing and instructive.

Game theory

As far as I can see, Ellsberg was familiar with only the rudiments of game theory, and I do not believe he was interested at all in its more mathematical aspects. Nonetheless, to understand his work, it will help to know those rudiments.

I'll demonstrate how things go by three examples. Each of these is a game with two players, and each player has two possible moves, making four outcomes in all. The moves are made simultaneously, so a player does not know what his opponent's move is when he makes his own. Each outcome has a numerical evaluation, which amounts to a payment to (say) player #1 (whom I'll call "R" for "Row"), which is taken from player #2 (whom I'll call "C" for "Column"). A negative payment is taken to be a positive payment to C. These payoffs are recorded in a $2 \times 2$ matrix.

In the first game, each player writes down either $H$ or $T$ on a slip of paper and puts it upside down. When both slips have been placed, they are turned over. There are four outcomes:

Both choose $H$. Player R is paid $\$1$ by C.
Both choose $T$. Player C is paid $\$1$ by R.
The choices are $HT$ or $TH$. No payments are made.

The payoff matrix is

If R calls $T$, she can do no better than $0$, and she might have to give a dollar away. If she calls $H$, she can do no worse than $0$, and might get $1$. She should therefore certainly call $H$. Likewise, C should call $T$. This situation is called a saddle point. (The analogy to saddle points in multivariable calculus becomes plain when considering matrix games with more than 2 strategies.) In the paper Theory of the reluctant duelist, based on his undergraduate thesis, Ellsberg complained about the lack of excitement in this reckoning, but it's hard to fault the logic.

The second game is like the first, except the payoff matrix is

Here there is no best play for either player. In fact, each of them might as well toss a coin, and call whatever comes up. This is in effect a randomized or mixed strategy, which is one of the principal contributions of Von Neumann to the theory of games. In real games, some degree of randomization has been common practice for a long time—for example in the familiar strategy of bluffing in games of poker, which must be done unpredictably in order to be effective. Von Neumann made possible, in principle, a quantitative analysis of this phenomenon, and in the book with Morgenstern illustrated this with a simplified analogue of poker.

Incorporating random choices of moves in an encounter is perhaps the main contribution of Von Neumann and Morgenstern to economics. In general, if a player has $n$ choices of moves, a pure strategy is one in which the player chooses one of these, and a mixed strategy is an array of $n$ probabilities according to which the player chooses one of them, say by consulting a random number generator. In this example, the array is $(1/2, 1/2)$ and the random number generator is a coin flip.

The third game will be more complicated, but also a bit more related to practical applications. R and C are at war. R is planning to ship a valuable cargo by airplane from one place to another. She has two planes available to do this, one of them (say #1) better protected from attack but more costly than #2. Both planes will make the trip. C will attack just one of them, with the intention of doing as much damage as possible.

Precise numerical values: If plane #1 is attacked, the probability of destruction is $0.2$. For plane #2 this is $0.4$. The value of the cargo is $\$400$K, that of plane #1 is $\$100$K, that of plane #2 is $\$80$K.

The payoff matrix will display the damage done to R. But this is not straightforward to evaluate. An attack on a plane is not guaranteed to do any damage at all. Insurance companies deal with this sort of thing by using the expected value of damage. This is the average amount of damage that would be likely if a huge number of attacks were made. For example, if 1,000 attacks were made on plane #1, around 200 would be successful. If it were carrying the cargo, each of these would amount to a loss of $100 + 400 = 500$. The total loss would be $200 \cdot \$500 = \$100,000$, and the average would be $\$100$. This is also called the expected loss, and it's what goes in the payoff matrix. Its other entries are calculated similarly.

How can this be used to find the mixed strategy to be used by R? We can find expected values of expected values! The recipe for the mixed strategy can be found around p. 71 in J. D. Williams' light-hearted book The Compleat Strategyst. It is graphical in nature.

Suppose C plays strategy #2, but R plays a mixed strategy $(x, 1-x)$. The expected damage will be found as in the diagram on the left below. The expected damage from either of C's strategies can be seen in the middle. The dark path represents the maximal possible damage done by C. R's best mixed strategy $x$ is that corresponding to the minimum of the maximum likely damages. It is the horizontal coordinate $\sigma$ of the intersection of the two line segments. In this example, $\sigma = 17/60$.

Choosing the mixed strategy associated to $\sigma$ will give C an excellent chance of a minimum of damage.

Remark. In these examples, the payoff for one player is the negative of the payoff to the other. These games are said to be zero-sum. (Another commonly used term is strictly competitive.) Few real-life applications are so strict. In general, the payoff matrix should record payoffs to both players, and they need not even be in the same currency. Nor, in real life, is it plausible that payoffs can be evaluated so precisely.

The last example is reminiscent of more interesting RAND reports, for example in the various notes (co)authored by the well known statistician David Blackwell. RAND has posted hundreds of such reports. A number of other mathematicians who became prominent in more conventional mathematical pursuits also worked for a while at RAND, for example Israel Herstein, John Milnor, and John Nash.

RAND

RAND was founded in the late 1940s with the hope that it would let the Air Force call upon scientific advice on how to conduct warfare in the age of nuclear weapons. This would continue a successful collaboration during the Second World War, and was responsible for a lot of research, including some well known textbooks, in game theory. As I count it, 26 people—all men, I am afraid—worked some time at RAND and were later awarded the Nobel Prize in Economics. Among them was John Nash, who has an independent reputation in purer mathematics. Henry Kissinger was also associated with RAND, but has an independent reputation of a different sort. So was Hermann Kahn, on whom Doctor Strangelove is said to have been modeled.

Considering his later activities and the reputation of RAND, it might seem strange that Ellsberg went there. He himself says in some autobiographical comments:

In 1959, I became a strategic analyst at the RAND Corporation under the delusion—acquired as a summer consultant at RAND the previous year, and shared by all my colleagues and most of those who had access to Top Secret intelligence estimates—that a “missile gap” favoring the Soviets made the problem of deterring a Soviet surprise attack the overriding challenge to U.S. and world security. ... I had been drawn to the RAND Corporation because it was in the forefront of the emerging field of “decision theory,” the focus of my academic interests. Once there, I chose to apply my own work on individual decision-making under uncertainty to the most fraught, and possibly final, such decision in human history: the choice by the President of the U.S. or the Soviet premier—or, as I discovered, conceivably by one of their many subordinates—of whether to initiate all-out nuclear war.

Neither Ellsberg nor Schelling was much interested in technical aspects of game theory. What did interest them was that game theory suggested how to break up problems of making decisions into simpler factors. Both found payoff matrices and elementary probabilistic computations useful in explaining such problems. And together they managed to formulate a clear policy for how to be a more successful blackmailer! (Take a look at Schelling's An essay on bargaining and the last few pages of Ellsberg's fourth Lowell lecture.)

The economist's notion of utility

The main contribution of Von Neumann and Morgenstern to economics was the idea of seeing economic interactions as analogues of games. But there was a second, if more obscure, contribution to the theory of utility. This was the topic of both of Ellsberg's theses, undergraduate and graduate.

This topic can be introduced by the question, "What kind of entries are allowed in a payoff matrix?" Sure, the entries could be in monetary terms, but that is not all that common. Nations fight over land, corporations fight over access to markets, people fight over honour and status. Classically, economists dealt with this by pointing out that decisions often reduce to an expression of preference. This only allows one to order items in a sequence, as we shall see in a moment in a rough payoff matrix. But Von Neumann and Morgenstern pointed out that by making some very simple assumptions, one could assign to every object under consideration a numerical utility that allowed one, at least in principle, to deduce preferences. (Apples can be compared to oranges!) What was new in this was the introduction of probability and mathematical expectations in preferences. Given items $A$, $B$, and $C$, one could be asked if one preferred $A$ to a lottery ticket in which $B$ has probability $p$ of occurring and $C$ has probability $1-p$. Of course in the real world comparing apples and oranges can still be a difficult problem, but in recent years the theory of numerical utilities has been incorporated in the applications of game theory to computer networking. Take a look at the book Game theory for wireless engineers.

I bring up this subject because Ellsberg brings it up constantly in his Lowell lectures.

One curious feature of this theory is that utility cannot be directly interpreted as money. This has been known at least since the phenomenon of marginal utility was discovered. A fixed amount of money is of different utility to a rich person and a poor one.

The Lowell lectures

The following is a quote from Ellsberg's first Lowell lecture, The theory and practice of blackmail. It is based in turn on an article in the December 4, 1958 issue of the New York Herald Tribune.

What went through the mind of the bank teller in New York last December, as he read the note that a "little old lady" pushed through his window? "I have acid in a glass," the note said, "and if you don't give me what I want I'll splash it on you." He looked up, saw about ninety customers in the bank, a grey-haired lady in a brown coat before his window, and on his ledge a six-ounce water glass with a colorless liquid in it. He returned to the note, and read: "I have two men in here. I'll throw the acid in your face, and somebody will get shot. Put all the fives, tens, and twenties in this bag." He complied.

To be precise, the teller handed over a paper bag with $\$3,420$ inside. Just after leaving the bank, the woman dropped the paper bag onto the sidewalk. When somebody picked it up and tried to hand it to her, she ran away.

There are ways in which this blackmail attempt is both similar and different from the two-person zero-sum games we saw earlier.

The blackmailer first presents her demand, along with a threat if it is not agreed to.
The clerk has basically two options (but with variations): (i) he can comply with the demand or (ii) resist it.
The blackmailer then has her own choice of options, depending on what the clerk does: (i) If the clerk complies, she simply departs with the money. (ii) Otherwise, she can either carry out her threat or forget about it.

The most evident difference from the earlier games is that moves are not made simultaneously. Another is that payoffs to either participant are not so clear. Nonetheless, we can make up an imprecise payoff matrix:

The numbers here represent only relative damage to the victim: $[1]$ is less damaging than $[2]$, which in turn is less damaging than $[3]$. Because we are looking at a case of extortion, the payoff matrix is determined: the consequence of complying has to be less damaging than a successful resistance, and the consequence of an unsuccessful resistance has to be the worst outcome of all.

But in order to really understand what's going on, we have to know more. When the bank teller receives the demand, he asks himself two questions: (1) How serious would the damage be if I did not agree to the demand? (2) If I don't agree, will the threat be carried out? Or, more precisely: how probable would it be that the punishment be carried out?

To answer these, we have to assign real numerical evaluations of the consequences, the Von Neumann-Morgenstern utilities. Here is one possibility:

This matrix, too, is roughly determined—the $0$ and the $100$ just fix the scale of the payments, and the $20$ has to be somewhere in the interval between them. Otherwise we would not be considering an extortion. What is the significance of the $20$?

Ellsberg's main contribution to the subject of extortionate demands is this:

There exists a critical probability $\sigma$ with this property: if the probability that the threat will be carried out is greater than $\sigma$, then the victim should agree to the demand. Otherwise not. This probability can be computed from the Von Neumann-Morgenstern payoff matrix.

Ellsberg calls this the critical risk, but "risk" is a word used by economists (perhaps confusingly for the rest of us) to denote probability. Of course in real life it is impossible to measure all contributions so precisely that this probability can really be computed, but it is useful to have Ellsberg's analysis of what's going on.

How to compute the critical probability? Just as we computed mixed strategies. For the payoff matrix exhibited above, we make up the following diagram:

So the critical probability in this case is $1/5$.

Ellsberg hints at this computation throughout his Lowell lectures, but details can be found in Lecture #4, which happens to be the most technical of all.

We can now understand the blackmailer's basic problem: she wants the victim to pay up without fuss. In practice, there are costs to him of punishing a failure to pay up—for example, penalties can become much more serious. So her goal is to make the victim understand that the probability of being punished for failure to pay is higher than the critical probability. Much of the Lowell lectures is concerned with elaborating on this thread. There is much discussion in particular of Hitler's talent for extortion, and of the use of (possibly simulated) madness to be convincing.

In our example, of course the bank teller paid up after a rather brief consideration. But you'll have to look for yourself at Theory and practice of blackmail to see the very satisfactory surprise ending.

Reading further

Ellsberg's early writings

The Ellsberg Project at the University of Massachusetts

The Project Archivist is Jeremy Smith, whom I wish to thank for finding copies of a number of Ellsberg's papers for me.
Theory of the reluctant duelist, American Economic Review, December 1956, pp. 909-923. Reprinted in Bargaining: Formal Theories of Negotiation, edited by Oran R. Young and published by University of Illinois Press, 1975.
Classic & Current Notions of Measurable Utility, The Economic Journal, 1954, pp. 528-556

The two above are extracted from his undergraduate thesis.
Risk, Ambiguity, & the Savage Axioms, reprinted November 1961, The Quarterly Journal of Economics, 1961, pp. 644-661.

This is an extract from an early version of his Ph. D. It introduces what is now called the Ellsberg paradox, which hinges on how a person's attitude to uncertainty influences his decisions, and affects the validity of the Von Neumann-Morgenstern utility theory. This was Ellsberg's permanent contribution to econimics.
The published version of the Harvard Ph. D. thesis:
Risk, ambiguity,and decision

The Lowell Lectures

There is not much mathematics in these, but his analysis of some historical events in the light of his own studies of conflict and threats is not to be missed.

Preface

Theory and practice of blackmail
The threat of violence

The ending, and one page at the beginning, are missing.
An analysis of conflict

Also known as The crude analysis of strategic choices, a lecture presented at an economics conference.
Power economics
The political uses of madness
The really intelligent detonator

Page 21 is missing.

The doomsday machine.

One of two large autobiographies. This one covers the threat of nuclear war, in particular the Cuban missile crisis, which was certainly a game of "chicken".

Secondary material

Game theory

John von Neumann and Oskar Morgenstern, The theory of games and economic behaviour. Published by the Princeton University Press in several editions.
J. D. Williams, The compleat strategyst, published in the RAND series of MacGraw-Hill.

A very elementary introduction to $2 \times 2$ game theory, and a bit more.
Duncan Luce and Howard Raiffa, Games and decisions, Wiley, 1957.

The standard introduction in the 1950s.
Thomas Schelling, The strategy of conflict, Harvard University Press, 1963.

A classic of economics, which undoubtedly influenced Schelling's Nobel Prize award. His work and Ellsberg's during these years overlap to a considerable extent. Their periods of work at RAND also overlapped, and they undoubtedly talked to each other. Nonetheless, they each agree that they came up with many ideas independently. However, in the long run Schelling applied himself more steadily to problems of conflict and negotiation, and it paid off.
The authorized history of RAND

Utility theory

Daniel Bernoulli, Exposition of a new theory on the measurement of risk, Econometrica 22 (1954), pp. 23-36.

Translation from Latin. The origin of marginal utility.
Mark Dean, Lecture notes from courses in microeconomics at Columbia University.

For economists, the notion of utility—allied to that mythical figure the Economic Man—is no joke.
Israel Herstein and John Milnor, An axiomatic approach to measurability

A very clean analysis of a version of the Von Neumann-Morgenstern axioms for utility.

Sampled Poems Contain Multitudes

Ursula Whitcher — Fri, 01 Dec 2023 05:01:19 +0000

Different sampling approaches exist that target different sub-populations to make sure they appear in the sample. To see these sampling approaches at work we are going to sample lines from Walt Whitman’s poem “Song of Myself”…

Sampled Poems Contain Multitudes

Sara Stoudt
Bucknell University

The basic principle of statistical inference is motivated by the fact that we can rarely observe the entire population. For example, we often can’t talk to everyone to ask, “Who is your favorite poet?” It would take too long and be too expensive. Instead, we rely on a sample of the population. If the sample is representative, or “looks like” the population, we can use our findings from the sample to infer properties about the full population.

But how can we make our sample “look like” the population? Different sampling approaches exist that target different sub-populations to make sure they appear in the sample. To see these sampling approaches at work we are going to sample lines from Walt Whitman’s poem “Song of Myself” to make new poems. Follow along with the code needed to produce these sampled poems here.

Walt Whitman was an influential 19th-century poet who is often emblematic of the American poetic style of the time. “Song of Myself” was revised several times in different editions of his book, Leaves of Grass. Here I will use the final version that has 52 stanzas. Yes, this is a long poem, but as one of its most famous lines states, “I am large, I contain multitudes.” There are many different ways to recombine these lines to create new poems. And hey, if we don’t have time to read the whole poem, maybe we can still get a good sense of Whitman’s style and message by looking at a sample of the full text.

Frontispiece to an 1860 edition of Leaves of Grass

The sampled poems created here are examples of found poetry that “take existing texts and refashion them, reorder them, and present them as poems.” (There’s even some other found poetry based on Whitman, this time via erasure poems, where poets choose what to erase from a pre-existing piece, rather than what to keep, like we do here.) Let’s see what we can find!

Simple Random Sampling

The baseline sampling approach is to take a random sample. Then we rely on probability to lead us towards representativeness. For example, if 25% of the population’s favorite poet is Whitman, and we take a random sample of 100 people, we would expect about 25 of them to answer “Whitman”.

An illustration of simple random sampling.

Here is what we come up with when we form a poem that has the same number of lines as there were number of stanzas in original poem:

And these tend inward to me, and I tend outward to them,
My voice goes after what my eyes cannot reach, 
In vain the elk takes to the inner passes of the woods, 
I do not ask the wounded person how he feels, I myself become the wounded person,  
Whatever interests the rest interests me, politics, wars, markets, newspapers, schools,  
They do not think whom they souse with spray.  
Dung and dirt more admirable than was dream’d,             
This hour I tell things in confidence,         
And would fetch you whoever you are flush with myself. 
O despairer, here is my neck,    
The smallest sprout shows there is really no death,      
Embody all presences outlaw’d or suffering,       
I heard his motions crackling the twigs of the woodpile,      
Maternal as well as paternal, a child as well as a man,    
Lack one lacks both, and the unseen is proved by the seen,     
Were mankind murderous or jealous upon you, my brother, my sister?   
How they contort rapid as lightning, with spasms and spouts of blood!     
Out of the dimness opposite equals advance, always substance and increase, always sex,  
Dash me with amorous wet, I can repay you. 
At home in the fleet of ice-boats, sailing with the rest and tacking, 
Where the steam-ship trails hind-ways its long pennant of smoke,    
Magnifying and applying come I,            
The half-breed straps on his light boots to compete in the race,   
For I who am curious about each am not curious about God,  
I anchor my ship for a little while only,       
And filter and fibre your blood.  
Two well serv’d with grape and canister silence his musketry and clear his decks.  
I help myself to material and immaterial,  
Partaker of influx and efflux I, extoller of hate and conciliation,   
What I guess’d while I lay alone in my bed,    
Where the life-car is drawn on the slip-noose, where the heat hatches pale-green eggs in the dented sand,
Not a youngster is taken for larceny but I go up too, and am tried and sentenced.
I hear bravuras of birds, bustle of growing wheat, gossip of flames, clack of sticks cooking my meals,  
All goes onward and outward, nothing collapses,      
We found our own O my soul in the calm and cool of the daybreak.    
The press of my foot to the earth springs a hundred affections, 
Seeing, hearing, feeling, are miracles, and each part and tag of me is a miracle.     
This is the press of a bashful hand, this the float and odor of hair,   
Some made a mad and helpless rush, some stood stark and straight,    
Our vessel riddled and slowly sinking, preparations to pass to the one we have conquer’d, 
With music strong I come, with my cornets and my drums,   
And brown ants in the little wells beneath them,   
Flames and ether making a rush for my veins,                                                             
If our colors are struck and the fighting done?                                                         
I tighten her all night to my thighs and lips.                                                           
A youth not seventeen years old seiz’d his assassin till two more came to release him,                  
Waiting responses from oracles, honoring the gods, saluting the sun,                                    
And what is yet untried and afterward is for you, me, all, precisely the same.                           
Trippers and askers surround me,                                                                         
Lovers of me, bafflers of graves.                                                                        
Do I astonish more than they?                                                                            
I reach to the leafy lips, I reach to the polish’d breasts of melons.

Aside from some odd punctuation, this isn’t too bad for a randomly-generated poem. There are some awkward transitions, but one could read this as the product of more of a stream of consciousness approach than the more meticulously revised original.

Stratified Random Sampling

But what if there are certain subgroups of the population that do not have many members? Since they are so rare, it will be hard to pick them as part of the sample, just by chance alone. Stratified random sampling allows us to split the population into groups and then sample members of each group to form our full sample. This way each group is represented in the final sample.

For example, there are some stanzas of Whitman’s poem that have only 6 lines, where the median stanza length is about 20. If we stratify by stanza, so that each one is represented by one line, we end up with a poem like this.

This sampling approach lets every stanza have equal representation. We could also sample proportionally. That way each stanza is still represented, but longer stanzas aren’t under-represented. That one gets a little long to print here, but check out the code to see an example and make your own.

Cluster Random Sampling

Sometimes sampling comes with logistical challenges. For example, what if you had to travel to do the sampling? You might want to limit the number of cities you go to. You could cluster people by city and then sample at the city level, thereby limiting your travel. Here, we could cluster by stanza and pick a random sample of only stanzas for our new poem. This has the added benefit of keeping lines within a stanza together and limiting awkward transitions between lines. Jump to the poem here.

Systematic Sampling

Systematic sampling is also motivated by pragmatism. You start with a randomly-selected person and then systematically skip $k$ people on the list to choose the next person, and keep taking every $k$th person until you reach the end of the list… This can be helpful if you don’t know exactly how many people are available for sampling at the beginning of the study. For example, consider an exit poll where every fifth person to leave the polls is questioned.

A systematic dot selection.

Starting with the 50th line and taking every 50th line afterwards results in the following poem:

Stout as a horse, affectionate, haughty, electrical,                                                             
How could I answer the child? I do not know what it is any more than he.                                         
The youngster and the red-faced girl turn aside up the bushy hill,                                               
Twenty-eight young men and all so friendly;                                                                      
The litter of the grunting sow as they tug at her teats,                                                         
The regatta is spread on the bay, the race is begun, (how the white sails sparkle!)                              
Breathe the air but leave plenty after me,                                                                       
I find no sweeter fat than sticks to my own bones.                                                               
I believe you refuse to go back without feeling of me,                                                           Unscrew the locks from the doors!                                                                                
The little light fades the immense and diaphanous shadows,                                                       
A tenor large and fresh as the creation fills me,                                                                
The insignificant is as big to me as any,                                                                        
A gigantic beauty of a stallion, fresh and responsive to my caresses,                                            
Upon the race-course, or enjoying picnics or jigs or a good game of base-ball,                                   
My course runs below the soundings of plummets.                                                                  
They have clear’d the beams away, they tenderly lift me forth.                                                   His was the surly English pluck, and there is no tougher or truer, and never was, and never will be;             
It is I let out in the morning and barr’d at night.                                                              
I do not ask who you are, that is not important to me,                                                           
The day getting ready for me when I shall do as much good as the best, and be as prodigious;                     
Making a fetich of the first rock or stump, powowing with sticks in the circle of obis,                          
All below duly travel’d, and still I mount and mount.                                                            
I know I have the best of time and space, and was never measured and never will be measured.                     
And I swear I will never translate myself at all, only to him or her who privately stays with me in the open air.
If you do not say any thing how can I say any thing?

I love how the last three lines come together here!

Multi-stage Sampling

Multi-stage sampling combines multiple sampling schemes together. For example, we could do a cluster random sample and then take a simple random sample within each cluster. We wouldn’t be able to talk to everyone within a city like we proposed in the clustering example above anyway.

Here is a small-scale example of what this can look like for Whitman:

(No doubt I have died myself ten thousand times before.)                                         
Of the moon that descends the steeps of the soughing twilight,                                   
I ascend from the moon, I ascend from the night,                                                 
Of the turbid pool that lies in the autumn forest,                                               
I hear you whispering there O stars of heaven,                                                   
And as to you Death, and you bitter hug of mortality, it is idle to try to alarm me.             
And what I assume you shall assume,                                                              
I, now thirty-seven years old in perfect health begin,                                           
Hoping to cease not till death.                                                                  
For every atom belonging to me as good belongs to you.                                           
Creeds and schools in abeyance,                                                                  
I celebrate myself, and sing myself,                                                             
The grave of rock multiplies what has been confided to it, or to any graves,                     
Somehow I have been stunn’d. Stand back!                                                         
That I could forget the mockers and insults!                                                     
That I could look with a separate look on my own crucifixion and bloody crowning.                
That I could forget the trickling tears and the blows of the bludgeons and hammers!              
Inland and sea-coast we go, and pass all boundary lines,                                         
One of the Nation of many nations, the smallest the same and the largest the same,               
At home on the hills of Vermont or in the woods of Maine, or the Texan ranch,                    
A farmer, mechanic, artist, gentleman, sailor, quaker,                                           
A novice beginning yet experient of myriads of seasons,                                          
A Southerner soon as a Northerner, a planter nonchalant and hospitable down by the Oconee I live,
At home in the fleet of ice-boats, sailing with the rest and tacking,                            
He staid with me a week before he was recuperated and pass’d north,                              
I tuck’d my trowser-ends in my boots and went and had a good time;                               
In the late afternoon choosing a safe spot to pass the night,                                    
And gave him a room that enter’d from my own, and gave him some coarse clean clothes,            
And brought water and fill’d a tub for his sweated body and bruis’d feet,                        
Wandering amazed at my own lightness and glee.

Up until now, we have been sampling by line to ease the understandability of the found poem, but what if we instead sampled by word? There is an app for that, so you can go through this same process, now sampling by word, with your favorite poem or song lyric. What do you notice about the poems here and the poems discovered using the app for the same Whitman poem? Want to make an easy dataset to work with like the one used here? Check out this poem parser tool. Share with us the poems you find, and remember each poem, song, and sample contains multitudes!

Stratified Sampling Poem

I harbor for good or bad, I permit to speak at every hazard,                                                                                 
You shall not look through my eyes either, nor take things from me,                                                                          
Nor any more youth or age than there is now,                                                                                                
Apart from the pulling and hauling stands what I am,                                                                                         
And I know that the spirit of God is the brother of my own,                                                                                  
Tenderly will I use you curling grass,                                                                                                       
I hasten to inform him or her it is just as lucky to die, and I know it.                                                                    
The hurrahs for popular favorites, the fury of rous’d mobs,                                                                                  
And roll head over heels and tangle my hair full of wisps.                                                                                   
On a bank lounged the trapper, he was drest mostly in skins, his luxuriant beard and curls protected his neck, he held his bride by the hand,
Dancing and laughing along the beach came the twenty-ninth bather,                                                                           
Blacksmiths with grimed and hairy chests environ the anvil,                                                                                  
And consider green and violet and the tufted crown intentional,                                                                              
The pert may suppose it meaningless, but I listening close,                                                                                  
The machinist rolls up his sleeves, the policeman travels his beat, the gate-keeper marks who pass,                                          
I resist any thing better than my own diversity,                                                                                             
If they are not yours as much as mine they are nothing, or next to nothing,                                                                  
I blow through my embouchures my loudest and gayest for them.                                                                                
This hour I tell things in confidence,                                                                                                       
I find no sweeter fat than sticks to my own bones.                                                                                           
Earth of the slumbering and liquid trees!                                                                                                    
Partaker of influx and efflux I, extoller of hate and conciliation,                                                                          
It alone is without flaw, it alone rounds and completes all,                                                                                 
I speak the pass-word primeval, I give the sign of democracy,                                                                                
I crowd your sleekest and best by simply looking toward you.                                                                                
I am cut by bitter and angry hail, I lose my breath,                                                                                         
I have instant conductors all over me whether I pass or stop,                                                                                
Deluding my confusion with the calm of the sunlight and pasture-fields,                                                                      
Rich showering rain, and recompense richer afterward.                                                                                        
The insignificant is as big to me as any,                                                                                                    
And the cow crunching with depress’d head surpasses any statue,                                                                             
Myself moving forward then and now and forever,                                                                                              
Approaching Manhattan up by the long-stretching island,                                                                                      
They were the glory of the race of rangers,                                                                                                 
One of the pumps has been shot away, it is generally thought we are sinking.                                                                 
Stretch’d and still lies the midnight,                                                                                                       
In at the conquer’d doors they crowd! I am possess’d!                                                                                        
I troop forth replenish’d with supreme power, one of an average unending procession,                                                         
They are wafted with the odor of his body or breath, they fly out of the glance of his eyes.                                                 
Spread your palms and lift the flaps of your pockets,                                                                                       
It is middling well as far as it goes—but is that all?                                                                                       
Whatever interests the rest interests me, politics, wars, markets, newspapers, schools,                                                      
I take my place among you as much as among any,                                                                                              
Now on this spot I stand with my robust soul.                                                                                                
He joins with his partners a group of superior circuit,                                                                                     
I know I have the best of time and space, and was never measured and never will be measured.                                                 
(It is you talking just as much as myself, I act as the tongue of you,                                                                       
And there is no object so soft but it makes a hub for the wheel’d universe,                                                                  
O suns—O grass of graves—O perpetual transfers and promotions,                                                                               
Perhaps I might tell more. Outlines! I plead for my brothers and sisters.                                                                    
Very well then I contradict myself,                                                                                                          
The last scud of day holds back for me.

Go back to the main text

Cluster Sampling Poem

I harbor for good or bad, I permit to speak at every hazard,                                                                                 
You shall not look through my eyes either, nor take things from me,                                                                          
Nor any more youth or age than there is now,                                                                                                
Apart from the pulling and hauling stands what I am,                                                                                         
And I know that the spirit of God is the brother of my own,                                                                                  
Tenderly will I use you curling grass,                                                                                                       
I hasten to inform him or her it is just as lucky to die, and I know it.                                                                    
The hurrahs for popular favorites, the fury of rous’d mobs,                                                                                  
And roll head over heels and tangle my hair full of wisps.                                                                                   
On a bank lounged the trapper, he was drest mostly in skins, his luxuriant beard and curls protected his neck, he held his bride by the hand,
Dancing and laughing along the beach came the twenty-ninth bather,                                                                           
Blacksmiths with grimed and hairy chests environ the anvil,                                                                                  
And consider green and violet and the tufted crown intentional,                                                                              
The pert may suppose it meaningless, but I listening close,                                                                                  
The machinist rolls up his sleeves, the policeman travels his beat, the gate-keeper marks who pass,                                          
I resist any thing better than my own diversity,                                                                                             
If they are not yours as much as mine they are nothing, or next to nothing,                                                                  
I blow through my embouchures my loudest and gayest for them.                                                                                
This hour I tell things in confidence,                                                                                                       
I find no sweeter fat than sticks to my own bones.                                                                                           
Earth of the slumbering and liquid trees!                                                                                                    
Partaker of influx and efflux I, extoller of hate and conciliation,                                                                          
It alone is without flaw, it alone rounds and completes all,                                                                                 
I speak the pass-word primeval, I give the sign of democracy,                                                                                
I crowd your sleekest and best by simply looking toward you.                                                                                
I am cut by bitter and angry hail, I lose my breath,                                                                                         
I have instant conductors all over me whether I pass or stop,                                                                                
Deluding my confusion with the calm of the sunlight and pasture-fields,                                                                      
Rich showering rain, and recompense richer afterward.                                                                                        
The insignificant is as big to me as any,                                                                                                    
And the cow crunching with depress’d head surpasses any statue,                                                                             
Myself moving forward then and now and forever,                                                                                              
Approaching Manhattan up by the long-stretching island,                                                                                      
They were the glory of the race of rangers,                                                                                                 
One of the pumps has been shot away, it is generally thought we are sinking.                                                                 
Stretch’d and still lies the midnight,                                                                                                       
In at the conquer’d doors they crowd! I am possess’d!                                                                                        
I troop forth replenish’d with supreme power, one of an average unending procession,                                                         
They are wafted with the odor of his body or breath, they fly out of the glance of his eyes.                                                 
Spread your palms and lift the flaps of your pockets,                                                                                       
It is middling well as far as it goes—but is that all?                                                                                       
Whatever interests the rest interests me, politics, wars, markets, newspapers, schools,                                                      
I take my place among you as much as among any,                                                                                              
Now on this spot I stand with my robust soul.                                                                                                
He joins with his partners a group of superior circuit,                                                                                     
I know I have the best of time and space, and was never measured and never will be measured.                                                 
(It is you talking just as much as myself, I act as the tongue of you,                                                                       
And there is no object so soft but it makes a hub for the wheel’d universe,                                                                  
O suns—O grass of graves—O perpetual transfers and promotions,                                                                               
Perhaps I might tell more. Outlines! I plead for my brothers and sisters.                                                                    
Very well then I contradict myself,                                                                                                          
The last scud of day holds back for me.

Go back to the main text

What I Think About When I Think About Voting

Ursula Whitcher — Wed, 01 Nov 2023 04:01:31 +0000

Inevitably, I think back to my favorite result in mathematics: when Diaconis used the representation theory of the symmetric group to show us that psychologists just don’t get along…

What I Think About When I Think About Voting

Sarah Wolff
Denison University

It’s November. Here in Ohio, that means cozy sweaters, crisp mornings, pumpkin everything, Thanksgiving, and sometimes a first snow. November also means election season, and as someone who likes to think about voting I find myself lost in thought more than usual around this time.

Inevitably, election coverage in the news will start me down a rabbit hole that quickly expands to much broader scenarios than choosing a presidential candidate. You see, the type of voting that I like to think about is ranked-choice voting, aka any scenario where a group of people is asked to choose and rank from a list of items. This could be used for anything from electing a mayor to filling out a survey of preferences. So yes, I’m quickly thinking about a group of diners ranking their favorite food items. Or a group of students choosing a new Denison mascot—Buzzy the turkey vulture? Denideer the deer? Swasey the walking chapel?

Mascot images from hypothetical election used courtesy of Denison University Communications.

Or if I’m in a nostalgic mood I might think about how my college soccer coach made all 20 of his players rank each other from 1 to 20. And, yes, he then shared that data with us. At the time it felt… cruel. But now that I know just how hard it really is to understand this type of data and also how hard it is for humans to handle ranking more than five choices, it feels, well, misguided.

When I think about voting, sometimes I wonder if I could get my hands on that 15+ year old data set. Sometimes I think about the incredible amount of data in this world that comes from asking people to rank their choices, and the incredible amount of variation in analyzing that data (see e.g. Bargagliotti et al.). But inevitably, I think back to my favorite result in mathematics: when Diaconis used the representation theory of the symmetric group to show us that psychologists just don’t get along. That is the result that I’d like to build to here.

Proceeding to Choose a Procedure

Let’s back up a bit and talk about the different considerations that go into setting up an election. First and foremost: which voting method will we use? In other words: how will the voters vote? Next, which voting procedure will we use, i.e., how will a winner be determined from the votes? Once the votes are in, will we analyze that data? If so, how? Will the analysis support the outcome or point to other considerations?

Of course there is also the question of whether the voters were grouped into districts, which opens up an entire world of mathematical and moral questions (see for example the work of the MGGG Redistricting Lab).

So how will the voters vote? Well, there are many different voting methods. One is plurality voting where voters select their favorite candidate from a list (see for example the 2023 AMS Presidential election). Another, often used for electing committee members, is approval voting where voters select any candidate they approve of from a list, sometimes with a set maximum number (see for example the 2023 AMS Editorial Boards Committee election). As with all voting methods, plurality and approval voting have pros and cons. For example, approval voting could potentially create a committee entirely comprised of members reflecting the values of 51% of the electorate without a single member reflecting the values of the remaining 49%.

One voting method rising in popularity is ranked-choice voting. While its implementation in the 2021 New York City mayoral elections has made ‘ranked choice voting’ seem synonymous with ‘instant runoff voting’ (IRV), ranked choice voting is a voting method while IRV is a voting procedure. Ranked-choice voting just means the voters ranked some or all of the candidates in order of preference.

Given an election between $n$ candidates consider $S_n$, the symmetric group on $n$ letters. Each element of $S_n$ is a permutation—ranking—of the $n$ letters and corresponds to one of the choices a voter can make. We represent a ranked-choice election by a function $p:S_n\rightarrow \mathbb{C}$ where $p(\sigma)$ gives the number of voters who voted for ranking $\sigma$. In the voting theory literature $p$ is often called a profile. Note that in a typical election $p(\sigma)$ is an integer; however, working with normalized data quickly puts us outside the realm of the integers and viewing our functions with range $\mathbb{C}$ allows us to do interesting representation theory.

Here is a possible profile. (University Communications would like me to note that this is a hypothetical scenario: no such vote for a Denison mascot has happened.)

\[\begin{array}{lccccr}
p\large(\text{Buzzy, Denideer, Swasey})=12 &&&& p\large(\text{Buzzy, Swasey, Denideer})=7\\
p\large(\text{Denideer, Buzzy, Swasey})=22 &&&& p\large(\text{Denideer, Swasey, Buzzy})=5\\
p\large(\text{Swasey, Buzzy, Denideer})=25&&&& p\large(\text{Swasey, Denideer, Buzzy})=3 \end{array}
\]

Given a profile there are many different ways to select a winner. For example, we could apply a weighting vector $\mathbf{w}$ to each ranking that gives weight $w_j$ to the candidate ranked in the $j$th position. A weighting vector of $\mathbf{w}=[1,0,\dots,0]$ recovers plurality voting while $\mathbf{w}=[1,1,\dots,1,0]$ would be anti-plurality—voters indicating their least-desired candidate—and $\mathbf{w}=[n-1, n-2,\dots,1,0]$ is the well-known Borda count.

Outside of assigning weighting vectors there are plenty of other options. In 1785 the Marquis de Condorcet proposed Condorcet’s criterion: if there is a candidate that wins every head-to-head contest then that candidate should win. The 2021 New York mayoral election used instant runoff voting, which first checks if there is a candidate who is chosen in first position by more than 50% of the voters. If not, the candidate with the fewest number of votes is eliminated, the rankings are updated for the remaining candidates, and the process continues.

Arrow Takes Aim

Different voting procedures have the potential to lead to different election outcomes. Indeed, for the profile $p$ above representing a contest between Denison mascots, plurality, Borda, and IRV each produce a different winner. So which procedure should we use?

Interestingly, that question is much harder to answer than it would seem. In 1951, economist Kenneth Arrow investigated reasonable conditions that a voting procedure should satisfy. Applied to a ranked-choice election these are:

Unrestricted Domain: voters should be allowed to choose any ranking of the $n$ candidates.
Pareto Principle: if all voters prefer candidate $A$ to candidate $B$, then candidate $A$ should place above candidate $B$.
Independence of Irrelevant Alternatives (IIA): the presence of an irrelevant candidate, eg $C$, should not affect the head-to-head ranking of candidates $A$ and $B$.

Arrow then proved his impossibility theorem, often summarized as: “the only voting procedure with three or more candidates that satisfies the above conditions is a dictatorship.”

While it may seem like Arrow’s theorem declares that voting is broken, we can see that there is more to the story. The first two conditions are exceedingly reasonable but IIA is one that quickly leads to debate. A research mentor once explained IIA this way: suppose you are at a restaurant and the server tells you that you can choose between lemon meringue pie and caramel cake. You choose the cake. But then the server comes back, having remembered that they also offer ice cream sundaes. You say: “Oh! Well in that case I’ll have the pie!”

Caramel cake, lemon chiffon pie, and sundae images used under CC BY 2.0.

Even with this simple example, people often come up with reasons why the interaction could make sense. And in an election, the introduction of a new, seemingly ‘irrelevant’ candidate could absolutely change some voters’ ranking of two other candidates.

Really, Arrow is telling us to be thoughtful and purposeful about elections: when choosing the voting system we will use, we need to think through which criteria are most important to us.

Method $\neq$ Procedure $\neq$ Analysis

Think of how much ranked-choice data is out in the world. Any time you have been asked to fill out a survey of preferences, you were adding to a particular ranked-choice data set. I don’t know about you, but I’ve contributed a lot of data in my life.

Despite this sea of data, there is no standard method of analysis. Analyses vary widely but usually fall into one of three categories: descriptive methods, regression methods and clustering methods. As a simple example of a descriptive method, we could provide the average ranking of each candidate or a table of how many times each candidate is ranked in each position. A rich discussion of these three categories can be found in Bargagliotti et al., which provides examples of ranked-choice data in the fields of education and psychology, delves into how the data is traditionally analyzed and used in decision-making in these fields, and proposes new techniques that could “more fully leverage information contained in ranked data” (page 17).

To me, however, the most fascinating analysis of ranked data falls outside of the three categories above: using a generalized Fourier transform to reveal patterns within the data. To be completely honest, it can be hard to convince a non-mathematics audience to do a Fourier transform on the symmetric group to analyze their data set. The categories above are perhaps more practical—easier to explain, easier to interpret, easier to talk about with a broad audience—but the structure that a Fourier transform can reveal is just too beautiful to leave out of the conversation. My favorite example is Diaconis’s work delving into the results of the 1980 American Psychological Association (APA) presidential election.

The classical discrete Fourier transform (DFT) of a function $f:[0,1,\dots,n]\rightarrow \mathbb{C}$ is the map $f\rightarrow\{\hat{f}(k)\mid k=0,\dots,n-1\}$ arising from expressing $f$ in form:
\[f=\sum_{k=0}^{n-1} \hat{f}(k)\omega_{k}.\]
where $\omega_k:[0,1,\dots,n]\rightarrow \mathbb{C}$ is defined as $\omega_{k}(j)=e^{2\pi ikj/n}$. We call $\hat{f}(k)$ a Fourier coefficient.

From an algebraic perspective we see that $f$ is a function on the cyclic group $C_n$ and therefore an element of the group algebra $\mathbb{C}C_n$ of complex-valued functions on $C_n$. Taking a representation-theoretic perspective, the set of functions $\{\omega_1,\dots,\omega_n\}$ forms a complete set of inequivalent irreducible representations of $C_n$ and thus a DFT is a change-of-basis map that uses an orthogonal basis coming from the irreducible representations of $C_n$.

Taking this viewpoint allows for immediate generalization to any finite group $G$: take a function $f\in\mathbb{C}G$ and consider the Fourier coefficients that come from rewriting $f$ using a complete set of irreducible representations of $G$—an orthogonal basis for $\mathbb{C}G$. These are broad strokes: for more details see for example Section 3 of Rockmore's "Some applications of generalized FFTs" or see Crisman and Orrison for an excellent survey of representation theory applications in voting theory.

Remember how we captured election data using a profile $p$? Realizing that $p\in\mathbb{C}S_n$, we can project it into orthogonal subspaces determined by the irreducible representations of $S_n$! One reason it might make sense to use irreducible representations is because these subspaces are invariant under the action of $S_n$. This has the nice voting-theoretic interpretation that the analysis is invariant under relabeling of candidates. We definitely wouldn’t want our analysis to change if we just swapped the labels of candidates! Also, the representation theory of the symmetric group is both intrinsically beautiful and meaningful. Each irreducible representation corresponds to a partition of $n$ which can point us to relationships amongst the candidates.

Using similar ideas, Diaconis took the profile $p\in\mathbb{C}S_5$ corresponding to the APA’s 1980 presidential election among 5 candidates and projected $p$ into isotypic subspaces corresponding to each irreducible representation of $S_5$. The data seemed to be concentrated in the subspace $V_3$ corresponding to the irreducible representation labeled by:

This could indicate that there was a strong ‘pairs’ effect—that voters placed a pair of candidates together, either above or below the remaining three.

This observation led Diaconis to consider second-order information—positions of pairs of candidates (See Table 5 in his article “A generalization of spectral analysis with application to ranked data”). He found a large effect for ranking candidates 1 and 3 together—either both at the top or both at the bottom. It seemed that the voters either liked both or hated both. A similar effect was found for candidates 4 and 5.

Back to the APA election: at least back in 1980, the academicians and clinicians in the APA were on uneasy terms. Diaconis’s analysis captured that dynamic! That particular year, two of the candidates were academicians, two were clinicians, and one fell a bit more in the middle. "Voters seem to choose one type or the other, and then choose within, but the group effect predominates” (page 956).

This is one small example of how a seemingly obscure generalization of a Fourier transform can extract meaningful information in elections and surveys. But remember: extracting meaningful information is different from choosing a winner. While analysis may lead to considerations in choosing a voting procedure, analyzing voting data is not the same as choosing a voting procedure which is not the same as choosing a voting method.

So how should we vote? How should we analyze the votes? And how should we use this analysis? Maybe we should vote on it.

Correcting Errors

Ursula Whitcher — Sun, 01 Oct 2023 04:01:27 +0000

Who might have realized that when the mathematics community studied the properties of cubes, they might be used in error correction technology?

Correcting Errors

Joe Malkevitch
York College (CUNY)

Introduction

Humans sometimes make errors. Perhaps you have been sloppy with a calculation on a mathematics examination. You made an error. While many see machines and communications systems as more reliable than humans when it comes to making errors, machines and communications systems do make errors too. One can think of this as the consequence of there being noise in any physical communication system, which by chance alters the information involved. Advances in sending information electronically, including streaming sound and video, have led mathematicians to take an interest in increasing the reliability of data which is transferred electronically. Digital technologies work so well that we often now have lost track of the mathematical triumphs that make these technologies as reliable as they are. An earlier column looked at some aspects of using codes to correct errors in digital technology systems. Here I will look at a kind of error that communications systems are subject to, errors which arise due to a symbol in a code word being dropped (or inserted) rather than changed. Deletion errors have recently gotten renewed attention and are much less understood than other kinds of transmission errors. First, let me provide some background related to using mathematics to correct errors.

Can the data on a damaged disc still be read? (Image by Nathanael Coyne, CC BY-NC-ND 2.0)

Basic ideas

Mathematics is often viewed as studying properties of patterns involving numbers or shapes, the investigations that give rise to the areas of mathematics called algebra and geometry (topology). We are familiar with numbers and live with the fact that the number seventeen in English or dix-sept in French can be represented in different ways, depending on the system we choose for representing it. Thus, seventeen is written as 17 using the symbol alphabet 0, 1, 2 , ....,9, XVII using the Roman Numeral System of representation (where the alphabet is I, V, X, L, C, D, and M) and 10001 in binary (alphabet 0 and 1). But a recent insight is that a number written in decimal, say 130542, can be thought of as a string without thinking of its properties as a number written in decimal (base 10). Thus, 111011 is a string using the alphabet 0,1. GCTTAG is a short string (length 6) representing a small piece of DNA, where the alphabet consists of A,C,G,T. The letters used are the first letters of the nucleotides that make up DNA, adenine, cytosine, guanine and thymine. The letters in DNA representation are always used in groups of 3 symbols.

One leader in this approach was the Russian mathematician Vladimir Iosifovich Levenshtein (there are variant spellings) (1935-2017). It was he, together with other scholars both before and after him (notably Richard Hamming) (1915-1998), who studied in detail ways to compute the distance (how far apart) two strings are, as well as using strings to represent or code information for a wide variety of purposes.

Photo of Richard Hamming (1915-1998). Courtesy of Wikipedia.

Very briefly, the idea is to see how starting with one string, using a minimum number of insertions, deletions or substitutions it can be transformed into another string. From this point of view the string WORD is distance 2 from the string SWARD by using one substitution, O to A, and the insertion of an S. This approach to distance is one way to construct spell checking software for a word processor on a computer. If a word is typed that is not in the dictionary of words stored on the computer, perhaps the string which was typed is not really a word, so one tries to find a word in the dictionary that the data enterer might have meant by locating one or more words that are close or nearby to the typed word, using some way to measure the distance between strings.

Here is a very small example to give you the flavor of what can be done. To send an image, the image would be divided into a certain number of cells, often called pixels. In the simplest setting, each pixel might be black or white and one might send an image of the letter H represented in a 3x3 grid as shown below, using the code 0 for white and 1 for black. The image would be sent as 101111101 (one binary digit for each pixel). One may use the 0's and 1's to reconstruct the image, top to bottom and left to right.

A 3x3 pixel grid; grid used to represent the capital letter H with 9 black or white cells.

In recent times, a ubiquitous use of black and white cells in digital images is the QR (for Quick Response) code, a sample of which is displayed below.

A QR digital code. Courtesy of Wikipedia.

When codes are used to represent information, it is traditional to use codes where the number of symbols in each code word is the same. This makes it possible to work in an environment where some kind of delimiter, such as a comma or a space, is not needed to separate code words. Thus, if 00 represents white and 11 represents black then one might send the message 00111100 rather than 00, 11, 11, 00.

It is worth remembering the way that using mathematics in the world and doing mathematics for its own sake reinforce each other. Mathematicians have been fascinated by cubes in many cultures and since ancient times. Their interest stemmed from the fact that they were regular polyhedra. Initially, cubes showed up in Euclid's Elements as one of the 5 convex solids in 3-space whose vertices are all alike and whose faces are (convex) regular polygons. Euclid does not mention the issue of convexity, a concept developed later by other mathematicians. These five polyhedra are now known as the Platonic Solids. But later mathematicians looked at the question of what regular convex solids existed in higher dimensions than 3. Intriguingly there are 6 regular convex solids in 4-dimensional space but in dimensions 5 or more there are only 3 regular convex polyhedra. But who might have realized that when the mathematics community studied the properties of cubes, they might be used in error correction technology?

Ancient scholars (and artists) interested in polyhedra did not have the modern tool of coordinates to use as a tool. However, eventually it was observed that the vertices of an $n$-dimensional cube could be labeled with all of the possible binary strings of length $n$. One can get between the labeled version of a $n$-cube and a ($n+1$)-cube in a delightfully appealing way that is illustrated below. What is shown is one way to represent a labeled 3-dimensional cube with its 8 vertices labeled with the 8 binary strings of length 3. Note that the bottom square in the diagram has all of its third coordinates 0 while the top square has all of its bottom coordinates 1. The 4 vertical edges join corresponding vertices in the two copies of the square (also known as a 2-cube) together. More generally, to get an $(n+1)$-cube from two copies of a labeled $n$-cube, in one copy of the $n$-cube one uses the binary sequences of length $n$ with a 0 added in the $n+1$ position while in the other copy of the $n$-cube one adds a 1 in the last position.

A 3-dimensional cube labeled with the 8 binary strings of lengths 3

The 3-cube has 8 vertices and 12 edges. You can verify for yourself that the first diagram below can be thought of as a 4-dimensional cube with 16 vertices and 32 edges (as well as 24 square, two-dimensional faces). Cubes often give rise to attractive and symmetrical drawings. The first diagram below is appealing, but the fact that it represents a 4-dimensional cube is not so clear. The 5-cube and the 4-cubes which make it up can also be seen below. Based on the constructions shown, you can convince yourself that an $n$-cube has $2^n$ vertices and $n 2^{n-1}$ edges.

A 4-cube diagram. Courtesy of Wikipedia.

A diagram representing the 5-cube, built up from two 4-cubes.

Given a binary string of length $n$, we can think of this string as a label for a vertex of an $n$-dimensional cube. The string we send conveys the information, for instance the gray level of a pixel in an image. If a string that is sent is not one of the possible strings in the dictionary whose entries consist of the binary digits in the code word, we know an error has been made, but typically this does not allow us to correct the string that arrived to the string that was sent. So the way we will conceptualize a code word is that some of its digits are information digits and that other digits are added with the goal of correcting errors.

If we want to code black or white pixels, we might use 0 to represent white and 1 to represent black, but if a 0 is sent a 1 might arrive and if a 1 is sent a 0 might arrive. A first idea might be that we could send the code word repeatedly, and though we are wasting space and time with sending a longer string than we need, perhaps this longer string will allow correction. Thus with the code 0, 1 we can sent 00 for 0 and 11 for 1. However, note that if 00 is sent and a single error occurs so that 10 arrives, we cannot be sure if this is because 00 was sent with an error or that 11 was sent and 10 arrived. In the first case an error occurred in position one and in the second case an error occurs in the second position. Thus, a single repetition is not good enough. We need something longer—000 or 111 would work, because now a single error allows us to use the notation of Hamming distance to tell which was the more likely sent string. For two strings of the same length, the Hamming distance between them is the number of positions in which the two strings differ. Thus, the strings 000 and 111 and the strings 101 and 110 have Hamming distance 3 and 2, respectively.

Major contributions to the theory of error correcting codes have arisen from practitioners of many backgrounds. I have already mentioned Richard Hamming but also remarkable are the contributions of Vera Pless(1931-2020) who made important contributions to coding theory as a researcher, teacher, and in applied settings.

Photo of Vera Pless. Courtesy of Wikipedia.

Code transmission errors

Suppose one wants to code that the pixels in an image are black or white. Thus, one is trying to code two states or pieces of information. One would prefer to code this with short strings rather than longer ones. Using one bit will not work because if one codes black with 1 and white with zero, and a single bit is deleted what one receives is the empty message, and one cannot tell if a 0 or 1 was sent. If one uses 10 for black and 01 for white, then the received messages with one deletion are either 1 or 0. However if 1 arrives this can be due to the fact that 01 was sent and a 0 was deleted or if 10 was sent and a 0 was deleted. If one uses 00 and 11 as the code words, now if 0 arrives then 00 was sent while if 1 arrives 11 was sent and one can recover the original information.

Think about this situation. Consider the following collection of binary code words of size 5:

$$00000, 10001, 01010, 11011, 11100, 00111$$

For each code word above, let us write down the result of one digit in the code word being deleted:

0000 (one string)
0001, 1001, 1000 (3 strings)
1010, 0010, 0110, 0100, 0101 (5 strings)
1011, 1111, 1101 (3 strings)
1100, 1110 (2 strings)
0111, 0011 (2 strings)

Notice that the number of possibilities can vary from one code word to another code word. In this example, since one deletion results in exactly 16 (1 + 3 + 5 + 3 + 2 + 2 = 16) possible binary sequences, this set of code words covers all of the possibilities and we can uniquely decide which string must have been sent for any choice of string that arrives.

When a set of code words is sent using hardware, it is possible that elements of the coded message may incur errors. Here are some ways that a communication channel might alter a sent code word:

10111 sent but 1111 arrived (though 5 symbols were sent only 4 were received)
10111 sent but 100111 arrived (though 5 symbols were sent 6 were received)
10111 was sent but 1?111 arrived (5 symbols were sent and 5 were received but the receiver is not sure what symbol was sent in the second position. It could be a 0 or 1, a reality indicated by using a ? to represent the fact that the receiver is not sure if a 0 or 1 arrived.
10111 was sent but 11111 arrived (5 symbols were sent and 5 received but there was an error in the second position; a 1 arrived though a 0 was sent in position 2 so the received string is 11111.

To design ways to digitally work with texts and images sent via a communications system, the notion of a mathematical channel was pioneered. The idea is that one transmits a sequence of data in the form of binary digits but what one receives at the other end of the channel may be different from what is transmitted in specific ways. Thus, in one model a 1 might change to a 0 with certain probability, or a zero might be changed to a 1 with the same probability p. This channel is probably the most studied and is known as the binary symmetric channel. Other kinds of communications channels can be described that relate to the kinds of errors that were briefly looked at above.

Deletion error codes

The specific case that interests me is that of a communications channel where binary strings of the same length are used to encode information, and where, when a string is sent, it may be corrupted by a single deletion of one of the sent binary bits. Note that if one is expecting a code word, say, with 5 bits and only 4 bits arrive, one knows that a deletion error has occurred. It turns out that there is a strong relationship between the theory of codes involving deletion errors and insertion errors but only deletion error codes will be addressed here. In addition to Levenshtein, other pioneers of error deletion codes were Grigory Tenengolts and Rom Rubenovich Varshamov (1927-1929) a Soviet/Armenian mathematician).

So the general question to be studied is: If there are $k$ symbols, characteristics, etc. to be encoded, what is the largest number of binary code words of a fixed length $n$ that can used to capture the information one wants to transmit, when a code word might be subjected to the deletion of one binary bit? Above we saw that we could construct a binary code of length 5 with six code words which could correct one deletion error. You can verify that for strings of length 3, a code with two code words exists: 000, 101. Can you find a binary collection of code words of length 4 that will correct one error deletion? For strings of length 6 it is known that a code with ten code words exists, but it is not so easy to find such a code.

We have seen that the code 000 and 101 can be used to correct one deletion. So can the code 111 and 010, which arises from the first code by interchanging 1 and 0. You can check your understanding by considering the following question. If the set of binary strings $B$ of length $n$ allows one to use these strings to correct one deletion, will the set $B^*$ arising by interchanging the 0's and 1's in the strings of $B$ also give rise to a code which will enable one to correct one deletion error?

In using the error deletion codes discussed above, we have in essence imagined that we have a dictionary of the erroneous strings that arise and we correct the erroneous string using this dictionary. Study the idea of how one could correct the erroneous string by doing calculations on the erroneous string and based on these calculations, reconstruct the string that was sent.

Compared with other kinds of error correction codes, surprisingly little is known about error deletion codes. The discussion above can be extended to have the possibility of more than one deletion error (codes for when 2 error deletions might occur) or where the alphabet involves more than the two symbols 0 and 1. (For DNA, the alphabet uses 4 symbols.)

Enjoy thinking about and learning about using mathematics to study digital communications systems!

References

Abramson, N., Information and Coding, McGraw Hill, New York, 1963.

Berlekamp, E., Algebraic Coding Theory, McGraw-Hill, New York, 1968.

Berlekamp, E., (ed.), Key Papers in the Development of Coding Theory, IEEE Press, New York, 1974.

Blahut, R., Theory and Practice of Error Control Codes, Addison-Wesley, Reading, l983.

Blake, I., Algebraic Coding Theory: History and Development, Dowden, Hutchinson and Ross, Stroudsburg, 1973.

Hamming, R., Coding and Information Theory, Prentice-Hall, Englewood Cliffs, 1980.

Hill, R., A First Course in Coding Theory, Oxford U. Press, Oxford, l986.

Hoffman, D. et al, Algebraic Coding Theory, Charles Babbage Research Centre, Winnipeg, 1987.

Lin, S., An Introduction to Error-Correcting Codes, Prentice-Hall, Englewood Cliffs, 1970.

MacWilliams, Florence Jessie, and Neil James Alexander Sloane. The Theory of Error-correcting Codes. North-Holland, Amsterdam, 1977.

McEliece, R., The Theory of Information and Coding, Addison-Wesley, Reading, l977.

Peterson, W. and E. Weldon, Error-correcting Codes, 2nd. ed., MIT Press, Cambridge, l972.

Pless, V., Introduction to the Theory of Error-Correcting Codes, Wiley, New York, 1982.

Shannon, C. and W. Weaver, The Mathematical Theory of Communication, U. of Illinois Press, Urbana, 1949.

Sloane, Neil, On single-deletion-correcting codes, Codes and designs, 10 (2000) 273-291.

Tatwawadi, Kedar, and Shubham Chandak, Tutorial on algebraic deletion correction codes, arXiv preprint arXiv:1906.07887 (2019).

Thompson, T., From Error-Correcting Codes Through Sphere Packing to Simple Groups, Mathematical Association of America, Washington, 1983.

Viterbi, A. and J. Omura, Principles of Digital Communication and Coding, Mc-Graw Hill, New York, 1979.

Welsh, D., Codes and Cryptography, Oxford U. Press, Oxford, 1988.

Math Meets Congress

Courtney Gibbons — Fri, 01 Sep 2023 04:01:02 +0000

A view of the Capitol dome from the entrance to the Capitol Visitors Center

The nuts and bolts of what I do when thinking about math are very similar to the nuts and bolts of thinking about a policy problem…

Math Meets Congress

Or, A Mathematician Goes to Washington

Courtney Gibbons
Hamilton College

A train leaves the Dirksen Senate Office Building heading toward the Capitol traveling at fourteen miles per hour…

Thanks to the American Association for the Advancement of Science and their Science and Technology Policy Fellowships, I’ve spent the last year working for Senator Gary C. Peters with the majority staff of the Senate Committee on Homeland Security and Governmental Affairs (HSGAC, which we pronounce “HISS-gack”). Broadly speaking, my portfolio has lived in the “Governmental Affairs” realm. I’ve had a chance to work on things related to federal grants and cooperative agreements, federal data policies, artificial intelligence—and math! Especially the mathematics and statistics that power different kinds of AI systems (and why the math is relevant to the policy).

I’ve had the opportunity to serve in the Senate at the same time as Duncan Wright, the American Mathematical Society Congressional Fellow, worked in Senator Young’s office. Duncan tells me he’ll be writing about his experiences soon, too!

Dr. Duncan Wright, the AMS Congressional Fellow, hanging out with my family and our Senator-in-training and part-time dinosaur impersonator

One of the most thrilling aspects of working for the United States Senate has been using my training as a mathematical problem-solver to work on public policy problems (what can I say, I’m easy to thrill), in a very different way than following a traditional path like being an NSF rotator or working for the DOD in some capacity.

In my “normal” life, I work on problems in commutative and homological algebra—not exactly the most sought-after technical knowledge in Congress. But the nuts and bolts of what I do when thinking about math are very similar to the nuts and bolts of thinking about a policy problem or solution.

For example, when I say that I think about “rings and modules” to another mathematician, we have to achieve some clarity to keep communicating. To me, “ring” means unital, commutative, probably Noetherian, and almost certainly local or graded with a unique homogeneous maximal ideal. When I say “module,” I mean finitely generated. Change any of those properties, and the tools I use—like Nakayama’s Lemma—are off the table.

It’s the same in policy. When I started, I jumped on one of the office priorities: simplifying and coordinating the federal grant application process for recipients (and applicants, and potential applicants, and…). The first thing my mentor had me do was find the 1970s legislation that defines what financial assistance from the government means. And just like the word “ring” comes with many flavors of adjectives now, the word “grant” does, too. Is it a competitive grant? A formula grant? Is it for basic research? Disaster relief? I very quickly needed experts to help me understand the nuances.

Luckily for me, Congress has many experts: analysts at the Congressional Research Service and auditors at the Government Accountability Office investigate topics at the request of Congress, and many, many, many people have asked for reports and analyses of grants policies. Senator Peters held a hearing on grants to learn about the issue from people with additional hard-won insights they’ve collected after years (in some cases, careers!) of navigating the systems and processes required to get, use, and report on a grant.

Senator Peters also hosted the best-ever Take Your Kids to Work “Hearing”

Eventually, my teammates and I started looking for existing solutions to problems that might work in this situation, and, just like algebraists borrowed Betti numbers from topologists to study invariants of rings and modules, we started borrowing from other policy areas to put together some options.

Like math papers, potential legislation goes through a kind of peer review called “technical assistance” where people provide feedback on the bill text. And, like peer review, navigating different (sometimes conflicting) suggestions makes it interesting to figure out how to move forward. (The first idea I had got the equivalent of a bright red REJECT stamp from some external parties, but with lots of helpful feedback that informed my next attempt. Helpful review is truly a wonderful gift!)

The next steps for legislation include finding cosponsors, introducing the bill, shepherding it through the markup process where the committee with jurisdiction over the legislation has a chance to debate and change it, and eventually get it to the floor of the Senate; then it goes to the House; then, hopefully, it gets signed into law. I’ve only been here since October, so I will only see the bills I worked on through part of their journey. Like my math research work, it’s unclear which ideas, if any, will make it to the end of the process (or if I’ll recognize them when they do). But it’s been a gratifying experience to see up close how one Senate office (from the staff to the boss himself!) approaches the work of making the country a better place. The people I’ve worked with have renewed my hope and confidence in this country’s strange and byzantine processes. And I’ll keep my eye on S. 2286, the Streamlining Federal Grants Act, with the same tenderness I feel for my best and most fun mathematical collaborations.

Stepping from math professor life into Senate staffer life (at the same time as becoming a mom!) has been a strange but rewarding change-up. Take it from me: it’s never too late (or too early) to start finding your way to policymaking. Thanks to this fellowship and all the different policy areas I had a chance to learn about and work on, every night I went home thinking about policy in ways that made my neurons tingle just like when I’m in math mode.

My future Senator arranging his policy priorities

And, reader, aside from your tenacity in thinking about problems, you probably also have practitioner expertise as a working mathematician. Congress has a lot of opinions about what kinds of scientific activities (including research!) matter. Think math is apolitical? Think again! In 2020, a major piece of legislation (Pub. L. 116-283) required NSF to collaborate with other agencies and industry partners to create AI Research Institutes, determine the feasibility of a National Artificial Intelligence Research Resource, study the artificial intelligence workforce, and more. The new TIP directorate at NSF? You guessed it: Congress! The CHIPS and Science Act (Pub. L. 117-167) has an entire title devoted to telling the NSF how to prioritize its work. Someone, somewhere, needs you to speak up on behalf of math and its value.

I’ll vouch for Congress as a productive place to be that voice. (Did I mention they have trains?)

E Pluribus Unum

A Summer set.seed() Sestina

Ursula Whitcher — Tue, 01 Aug 2023 04:01:10 +0000

As a statistician, I had to put a little extra wrinkle in. How might I add an element of chance to the endeavor of writing a sestina?

A Summer set.seed() Sestina

Sara Stoudt
Bucknell University

Are you taking a break from mathematical thoughts and cozying up with a good beach read this summer? Understandable! But sometimes a reading “break” isn’t a complete break from the kinds of mathematical thinking you might do during work hours. Thanks to Sarah Hart, we can all become more aware of the connections between mathematics and literature these days. If you haven’t picked up a copy of “Once Upon a Prime” yet, I highly recommend it. She walks through mathematical themes, mathematical structures, and even mathematical characters in her new book that aims to show us that mathematics shares the creativity of literature, and literature thrives off of that pursuit of mystery that can inspire mathematical discovery. There isn’t a split between “numbers” people and “words” people. (Want more of this or can’t wait for your copy of the book to reach you at the library or in your mailbox? Marian Christie has more poetry and math resources to enjoy online.)

One of the topics Hart dives into is constraints on the writing itself that are imposed by mathematical concepts. For example, a sestina is a poem with permutation structure. A poet chooses six words, and those six words must appear at the end of certain lines in a six-line stanza. Then each word’s line number changes in each consecutive stanza according to a permutation. (For a self-aware sestina check out this one that Larry Lesser shared with me.)

Sestina permutations

I wasn’t always familiar with this type of poem. In the fall, I worked with a poetry professor here at Bucknell University, Katie Hays, to do a crossover class between her Intro to Poetry course and my first-year seminar on Storytelling with Data. Katie suggested that we work with the sestina when I had mentioned that we were learning about order mattering in the presentation of plots and tables. The more we brainstormed, the more we thought that by playing with the order of words in a story-like poem, we could drive that point home. Katie and my students collaborated on a sestina, but I never approached writing one myself. Until now!

However, as a statistician, I had to put a little extra wrinkle in. How might I add an element of chance to the endeavor of writing a sestina? Enter the set.seed idea. I could use a random number generator to tell me which order the words in the sestina should start in. I ended up with:

Shower
Read
Screen
Fan
Couch
Trip

Then the sestina permutation takes over, governing which words fall in which order from there on. And with that, here goes nothing!

An Academic’s Summer - set.seed(721)

(The first reprieve)

In academia, when it rains, it pours. A deluge of papers to grade, lectures to give - not a
sprinkling, not even a mere shower.

But then April’s more than showers bring May, full stop, and hobbies poke out of the ground to
bloom: I can go for a walk, I can brunch, I can read.

An unencumbered breath of outside air with a view of trees subs in for mask-filtered air and the
computer screen.

What to make of this new found quiet? No Slack clickety-clack, no inbox whoosh or ding.
Routineless? I’m a big fan!

It’s time to reacquaint myself with life’s simple pleasures: my non-dress pants, my power
playlist, my couch.

Time is infinite. I am limitless… maybe I should take a trip.

(The ambitious time)

Wouldn’t it be great if I could just get away? Turn off every notification and galavant off on the
perfect getaway trip.

I can already feel the stress evaporate just picturing it. No storm clouds in sight, just the
occasional sunshower.

Picture me taking in the sights, eating food not out of a Tupperware container, not ever thinking
about a desperate need for even the briefest of respites on the couch.

Think of all of that energy and inspiration just mine for the taking, making me ready to write just
for myself, for fun, not for progress. And in the same way, read.

With all of this cool new work I’ll produce, maybe I’ll amass more than one reader that counts
themselves as my academic fan.

I can dream, right? Me shining so bright it’s others who will need the sunscreen.

(The changing plan)

But then the friction kicks in - all that pre-planning, the jet-lag, the inevitable TSA screen.

I want to relax, not play travel agent to myself. I may not be optimizing my classroom time, but
instead, that pesky trip.

There is so much pressure to make the most of this “free time”. Of that, I am never a fan.

Not to mention that travel requires getting used to a new bed, and a new shampoo that
accompanies that new shower.

So I go ahead and leave those e-mails from my former ambitious self about that dream vacation
on read.

As June flies by and my friends ask about that vacation I was definitely going to take, I start to
couch.

(The veg out)

Sure a dream vacation would be nice, but there are so many beautiful things you can do from
the comfort of your couch.

On a stay-cation I can give myself over to rom-coms, reality shows, my guiltiest of pleasures.
Yes, I’m still watching. Thanks for checking, TV screen.

I can make an adult version of a blanket fort and timelessly read.

I can finally go through the registry and pick that perfect gift for my friend’s baby shower.

I can bask in complete silence and just stare up at my ceiling fan.

(The turning point)

But at some point, my surroundings and the silence start to lose their charm, even that hypnotic
ceiling fan.

Is that an urge to venture beyond the couch?

Do my sweatpants suddenly feel too comfortable? Am I perhaps up for a pre-noon shower?

I’m not ready to go all in, mind you, just a next semester pre-screen.

But… if I am going to eventually conjure some bold, new ideas, I’ll need Post-It Notes and new
colored pens (glitter preferably) to keep track of them. Time for a Staples trip!

There were all of those interesting pedagogy papers I e-mailed myself over the course of the
last year. Maybe it’s time to give them a quick, no pressure read.

A passive read turns active with highlighters, notes, mind maps, the occasional mutterings to
myself, and before I know it I’m a butterfly emerged recharged with strong wings like an
intricate fan.

You didn’t even need a fancy trip to get to this moment. That time was unwasted on the couch.

Without dread I return to my computer screen.

I even catch myself daydreaming about fun stats in the wild examples in the shower.

(The welcome back)

Welcome back students! Did any of you take a trip? What brought you joy… anything you read?

In this class you will be both a statistics shower and (story) teller. And yes, we’ll learn the details
too, like what residual plot shapes are dreaded. Oh no, not the fan!

I know the temptation of the couch, believe me, but I hope this class provides some activation
energy, helping you realize that you have power beyond your screen.

An inviting couch

Poetic process

Now that you’ve read my sestina experiment, I’ll get a little meta and tell you how I approached writing it. As I’ve tried to lean into my creative side and write poetry, I’ve been helped along by the power of constraints to help me face the blank page. Write a poem? Daunting! Write a poem with specific rules about each line? Doable! I was searching for advice for writing a sestina and came across this source that framed a sestina as a story and then proceeded to give a sample sestina about a cousin’s job. That made me think of doing one about my job, and specifically my job now that the semester is over. (Don’t teachers just frolic all summer?)

With a premise in place I just had to pick my words and get going. Half of the words were repurposed from another poetry project I was stuck on, and the other half came from trying to round out a summer story (including thinking of words that could be used in multiple ways). Before I started to write, I listed phrases or ideas that could end with each of the terms. Then I tried to find sub-themes in these that would help me write each stanza. I decided on a narrative arc that tracks me through the summer: initial relief at getting a break, an ambitious start for trying to use all my free time wisely, a mini burn-out that requires real rest, and finally a return of energy and hopes for the future school year. I decided to explicitly label each stanza (see the parenthetical labels) for a bit of extra clarity, a roadmap through time if you will. Another thing you might have noted as you read was that my lines can get fairly long. I’m more of a prose-poem kind of person.

Enough about me, it’s your turn! Set the seed to something else and make your own academic summer sestina using my words. Or come up with your own words and go through the same process with a different theme. Be free! Does this still seem too daunting? Try out a titrina, the “square root of the sestina,” instead. Now randomness will play two roles: one to sample 3 out of 6 initial words and one to provide an initial ordering. Happy writing! Feel free to share your poems with me at sas072@bucknell.edu, on Twitter (@sastoudt), or in the comments here.

Putting a period on mathematical physics

Ursula Whitcher — Sat, 01 Jul 2023 04:01:52 +0000

One of the fundamental forces in the universe is the weak force. The weak force is involved in holding atoms together or breaking them apart...

Putting a period on mathematical physics

Ursula Whitcher
Mathematical Reviews (AMS)

You've heard of periods at the ends of sentences and periods of sine waves. The word period also has a special meaning in number theory. These periods are surprisingly useful for solving problems in particle physics. In this month's column, I'll tell you more about what periods are, where the physics comes in, and how all of this relates to the geometry of doughnuts.

From doughnuts to integrals

Maybe you've heard the joke that a topologist can't tell the difference between a coffee cup and a doughnut. (If it's new to you, check out a beautiful illustration of the transformation by Keenan Crane and Henry Segerman.) Geometers are able to distinguish coffee cups from doughnuts. We can even tell the difference between types of doughnuts. For example, here's a fat, cakey doughnut:

Photo by 5th Luna, CC BY-NC 2.0

Here's a skinny, crunchy doughnut:

Photo by Janet Bianchini, CC BY-NC 2.0

But the geometry of doughnuts is so fascinating that, once you begin examining it, it's hard to think about anything else!

Let's describe the difference between our two doughnuts more formally. An idealized mathematical doughnut surface is called a torus. We can characterize the shape of a torus using two circles, one that goes around the outside and one that goes through the hole in the center. On a fat, cakey torus, these two circles are roughly the same size.

On a skinny, crunchy torus, the outer circle is much larger than the inner circle.

In these examples, the circles are easy to measure. But sometimes tori appear in more complicated ways. For example, suppose $x$ and $y$ are complex variables and $t$ is a complex parameter. Consider the solutions to the equation

\[y^2 = x(x-1)(x-t).\]

This is the famous (for number theorists) Legendre family of elliptic curves. If we throw in a solution "at infinity," then, topologically speaking, it is a family of tori. It's hard to graph the solution to an equation in two complex variables, but we can graph the real values. Here's what it looks like when the parameter $t$ is set to be equal to 3:

You can think of graphing the real points as slicing through the doughnut at an angle. In this graph, you see a skewed version of one of the circles and part of a second circle.

Measuring the lengths of these two circles is tricky. There is a general mathematical strategy from calculus class that we can try: set up an integral to measure the arclength. In this case, the appropriate integral turns out to be:

\[\int_\gamma \frac{dx}{y} = \int_\gamma \frac{dx}{\sqrt{x(x-1)(x-t)}} \]

Here, the integral is over an appropriate simple closed curve $\gamma$ in the torus/elliptic curve.

But there's a problem! I'll let a cartoon of an easily confused orange cat explain it.

The cat isn't lying: this integral is really hard. Standard techniques from calculus class do not work. In fact, this integral has no closed-form algebraic solution.

Periods and differential equations

The integral $\int_\gamma \frac{dx}{\sqrt{x(x-1)(x-t)}}$ is an example of a period. For a number theorist, a period is a number that you get by finding the integral of an algebraic expression over an appropriate subspace. (Technically speaking, we should be able to describe the regions we are integrating over using inequalities and systems of algebraic equations whose coefficients are rational numbers.)

Many interesting constants, such as $\pi$ and $\log 2$, can be written as periods. There are huge and interesting questions about periods: for example, how can we characterize which numbers arise as periods? Using operations on integrals, one can show that adding or multiplying periods produces a new period. This gives periods the structure of a ring. Another big open question is describing all the relations that the ring of periods satisfies.

Let's get back to trying to understand our specific period, $\int_\gamma \frac{dx}{\sqrt{x(x-1)(x-t)}}$. We know that the result of the integral is a number that depends on the parameter $t$, so let's think of the integral as a function $P(t)$. We can take derivatives of $P(t)$:

$$\frac{d}{dt} \int_\gamma \frac{dx}{\sqrt{x(x-1)(x-t)}} = \int_\gamma \frac{d}{dt} \frac{dx}{\sqrt{x(x-1)(x-t)}}.$$

As we take derivatives, the expression under the integral sign becomes more complicated, but it keeps the same general shape. By finding a common denominator, we can identify a relationship between $P(t)$, $P'(t)$, and $P''(t)$:

\[ t(t-1) P''(t) + (2t-1) P'(t) + \frac{1}{4} P(t) = 0.\]

This is a differential equation! (It's called the Picard-Fuchs equation, after the French mathematician Émile Picard and the German Jewish mathematician Lazarus Fuchs.) As a second-order differential equation, this Picard-Fuchs equation has two independent solutions. These solutions correspond to the two different circles on the torus.

A standard method for solving differential equations is to use an infinite series. In this case, one of the solutions to the differential equation for our period can be written in terms of the following series:

\[\sum_{n=1}^{\infty} \frac{((\frac{1}{2})(\frac{1}{2} + 1)\cdots (\frac{1}{2} +n-1 ))^2}{(n!)^2}t^n.\]

The numerator involves an expression, $(\frac{1}{2})(\frac{1}{2} + 1)\cdots (\frac{1}{2} +n-1 )$, that looks rather like a rising factorial shifted by $\frac{1}{2}$. If we replace this expression by the shorthand $(\textstyle{\frac{1}{2}})_n$, we get a more compact notation for our series:

\[\sum_{n=1}^{\infty} \frac{(\textstyle{\frac{1}{2}})_n^2}{(n!)^2}t^n.\]

This is a famous series known as the hypergeometric series, with numerator parameters $\frac{1}{2}, \frac{1}{2}$ and denominator parameter 1 (since there's only a single factorial in the denominator). The whole series is sometimes expressed by the even more compact notation ${}_2F_1\left(\textstyle{\frac{1}{2}, \frac{1}{2}}; 1 \,|\, t \right)$.

For more details about the solution process, including a description of the second independent period, see Don Zagier's in-depth essay The arithmetic and topology of differential equations. I'd like to show you a more complicated period that shows up in theoretical physics.

Sunsets and Feynman diagrams

In particle physics, describing the interactions between fundamental particles such as electrons and photons involves doing difficult integrals. (Even worse, from a mathematician's standpoint, these integrals may not always be well-defined!) Physicists organize these computations using diagrams called Feynman diagrams of increasing complexity. There are specific rules for creating and manipulating Feynman diagrams, but at a first approximation, one can imagine they tell stories about particles that meet, interact and perhaps undergo a transformation, then go their separate ways.

One of the fundamental forces in the universe is the weak force. The weak force is involved in holding atoms together or breaking them apart. It's the force that controls the process of radioactive decay and makes carbon-14 dating possible.

One can calibrate carbon-14 dating using tree rings. Photo by Bill Kasman (public domain).

To do computations involving the weak force, one must work with Feynman diagrams that contain loops. Here's a Feynman diagram with two loops that is sometimes called the sunset diagram.

The American mathematician Spencer Bloch and the French physicist Pierre Vanhove teamed up to study the sunset diagram. To simplify the problem, they worked with a model where there are only 2 space-time dimensions. (Imagine particles moving back and forth along a line as time passes.) They assumed that all the particles generated during the interaction have equal mass $m$, that there's a fixed external momentum $K$, and they threw in a constant $\mu$ to balance the units. The result is the following sunset integral:

\[\mathcal{I}_\circleddash = \frac{\pi^2 \mu^2}{m^2} \int_0^\infty \int_0^\infty
\frac{dx\,dy}{(1+x+y)(x+y+xy) - xy \frac{K^2}{m^2}} \]

This integral is really, really hard!

One of the key problems is that the denominator, $(1+x+y)(x+y+xy) - xy \frac{K^2}{m^2}$, might be 0. To understand more about where the denominator vanishes, we can set $t=\frac{K^2}{m^2}$. The result is a family of curves that depends on the parameter $t$:

$$(1+x+y)(x+y+xy) - t xy =0.$$

Here's the resulting graph for $t=11$.

The features of this graph might look familiar. We've got a skewed circle and part of another circle—the doughnut slices are back! In other words, $(1+x+y)(x+y+xy) - t xy =0$ is a parametrized family of elliptic curves.

Bloch and Vanhove pursued a strategy that might seem familiar. They set $\mathcal{J}_\circleddash = \frac{m^2}{\pi^2 \mu^2} \mathcal{I}_\circleddash$ to simplify the units, then looked for a differential equation involving $\mathcal{J}$:

\[\frac{d}{dt} \left(t(t - 1)(t - 9) \frac{d}{dt} \mathcal{J}_\circleddash \right) + (t-3) \mathcal{J}_\circleddash = -6.\]

Because the right-hand side of this differential equation is not zero, solving it is more complicated than solving the differential equation we saw earlier. Standard differential equation methods approach this kind of problem in two steps. First, solve the homogeneous equation where we pretend the right-hand side is zero. Then, find a solution to our inhomogeneous equation where the right-hand side is a nonzero constant.

Bloch and Vanhove showed that the homogeneous solutions to the Picard-Fuchs differential equation for $\mathcal{J}_\circleddash$ can be written in terms of the classical hypergeometric series ${}_2F_1\left(\textstyle{\frac{1}{12}, \frac{5}{12}}; 1 \,|\, - \right)$. This series places rising factorials involving $\frac{1}{12}$ and $\frac{5}{12}$ in place of the $\frac{1}{2}$ we saw earlier. I've used $-$ to indicate that a more complicated expression is plugged in for the series variable.

To solve the full inhomogeneous equation, we need another special constant, $\mathrm{Li}_2(z)$, known as the dilogarithm. The dilogarithm can be written as an infinite series. When $|z|<1$,

\[\mathrm{Li}_2(z) = \sum_{k=1}^\infty \frac{z^k}{k^2}.\]

The dilogarithm is also a period! We can write it using a double integral.

\[\mathrm{Li}_2(z) = \iint_{0 \leq u \leq v \leq z} \frac{du\,dv}{(1-u)v}.\]

Thus, periods give us a precise way to describe the solutions to the sunset diagram integral—as well as a reason to eat doughnuts!

Acknowledgments

I thank the Isaac Newton Institute for Mathematical Sciences, Cambridge, England for support and hospitality during the K-theory, algebraic cycles and motivic homotopy theory program, where I presented a version of this material during the institute's 30th birthday celebrations. This
work was supported by EPSRC grant no EP/R014604/1.

Feature Column

Elliptic curves come to date night

Elliptic curves come to date night

Board games for date night

A different type of equilibrium

A geometric space for probabilities

The geometry of dependency equilibria

Further reading

Is this $p$-hacking?

Is this $p$-hacking?

If you have to ask, it probably is.

Impossible?

Impossible?

Introduction

Impossibility in mathematics

Compass and straight edge construction impossibilities

Impossibility in computer science

Democracies

Impossibility in achieving "fairness"

Algorithmic fairness

References

Daniel Ellsberg and the Science of Extortion

Daniel Ellsberg and the science of extortion

Game theory

RAND

The economist's notion of utility

The Lowell lectures

Reading further

Ellsberg's early writings

The Lowell Lectures

Secondary material

Game theory

Utility theory

Sampled Poems Contain Multitudes

Sampled Poems Contain Multitudes

Simple Random Sampling

Stratified Random Sampling

Cluster Random Sampling

Systematic Sampling

Multi-stage Sampling

Stratified Sampling Poem

Cluster Sampling Poem

What I Think About When I Think About Voting

What I Think About When I Think About Voting

Proceeding to Choose a Procedure

Arrow Takes Aim

Method $\neq$ Procedure $\neq$ Analysis

Further Reading

Correcting Errors

Correcting Errors

Introduction

Basic ideas

Code transmission errors

Deletion error codes

References

Math Meets Congress

Math Meets Congress

Or, A Mathematician Goes to Washington

A Summer set.seed() Sestina

A Summer set.seed() Sestina

An Academic’s Summer - set.seed(721)

Poetic process

Putting a period on mathematical physics

Putting a period on mathematical physics

From doughnuts to integrals

Periods and differential equations

Sunsets and Feynman diagrams

Further reading

Acknowledgments