On October 9, I was interviewed by Macalester College President Brian Rosenberg about the Census. This was paired with a talk that Moon Duchin and I gave on campus two days earlier titled “Mathematical Interventions in Fair Voting,” and with a feature article about the Census in the Fall 2019 Macalester alumni magazine. I thought you might be interested in this topic, so decided to write about it here, to dive in a bit further, and reach a different audience.
Many thanks to my colleagues Ron Wasserstein and Steve Pierson at the American Statistical Society (ASA); they know much more about this topic than I, and generously share their expertise. Interestingly, the ASA was formed in November 1839 in Boston as a means to promote the 1840 Census. Ron did a nice interview on this topic.
Carrying out a good Census and making secure datasets available to researchers both involve statistics and mathematics. Additionally, the Census is the first step in the redistricting process, which is quickly evolving to involve more and more work of statisticians and mathematicians. This column is *not* about redistricting.
What is the history of the Census, why do we do it?
Article 1, Section 2, the Constitution includes the phrase:
“Representatives and direct Taxes shall be apportioned among the several States which may be included within this Union, according to their respective Numbers, which shall be determined by adding to the whole Number of free Persons, including those bound to Service for a Term of Years, and excluding Indians not taxed, three fifths of all other Persons. The actual Enumeration shall be made within three Years after the first Meeting of the Congress of the United States, and within every subsequent Term of ten Years, in such Manner as they shall by Law direct. The Number of Representatives shall not exceed one for every thirty Thousand, but each State shall have at Least one Representative…”
This tells us that the Census must be done for the purpose of (re)apportionment (and taxes), how often it must be done, and that there are lower and upper bounds for the size of the House of Representatives. Congress first met in 1789, and the first national Census was held in 1790. Current law controlling the Census requires that the Census be conducted on or about April 1. The returns must be made available within nine months in order to apportion members of the House of Representatives to each of the states.
While we are constitutionally mandated to do the Census in order to reapportion Congressional seats, Census data determine how a significant amount of federal funds are distributed. For example, in fiscal year 2015, Census data were used to determine the allocation of about \$675 billion, in over a hundred programs. According to a report out by the Census Bureau, the top five programs by amount of funds that used Census-based population numbers and population characteristics to determine fund distribution in fiscal year 2015 were: Medicaid, the Supplemental Nutrition Assistance Program (SNAP), Medicare Part B, Highway Planning and Construction, and the Federal Pell Grant program. The demographic data are also used by businesses to determine, for example, where to build new supermarkets and by emergency responders to locate injured people after natural disasters.
How many questions are there, and have there been? What are changes in 2020?
The first Census (in 1790) had six questions; it simply asked for the name of the head of the household and the number of people in the household of the following descriptions:
- Free White males of 16 years and upward
- Free White males under 16 years
- Free White females
- All other free persons
The distinction between the first two categories was made, in part, to determine the country’s military potential. And, you probably don’t need the unpleasant reminder that, for the purpose of apportionment, slaves were counted as three-fifths persons; the 1868 14th Amendment removed this fractionalization. You may also have noticed the bit about Indians who are not taxed; it took until 1940 for this to change and did so when the Attorney General ruled that there were no longer any “Indians not taxed.” US marshals took the Census in the original 13 States, plus the districts of Kentucky, Maine, and Vermont, and the Southwest Territory (Tennessee); they rode from house to house on horseback.
The number of questions in the decennial Census has varied widely since the first in 1790 to 2000, where a multi-page form with dozens of questions was sent to one out of every six households (the “long form”). It probably isn’t surprising, that reading about these questions is quite interesting (to me at least) and lends some great insights about political history. 1910 provides rich examples. Instructions to enumerators include:
“For persons born in the double Kingdom of Austria-Hungary, be sure to distinguish Austria from Hungary. For person born in Finland, write Finland and not ‘‘Russia.’’ For persons born in Turkey, be sure to distinguish Turkey in Europe from Turkey in Asia.”
“If the Indian is of mixed blood, write in column 36, 37, and 38 the fractions which show the proportions of Indian and other blood, as (column 36, Indian) 3/4, (column 37, white) 1/4, and (column 38, negro) 0. For Indians of mixed blood all three columns should be filled, and the sum, in each case, should equal 1, as 1/2, 0, 1/2; 3/4, 1/4, 0; 3/4, 1/8, 1/8; etc. Wherever possible, the statement that an Indian is of full blood should be verified by inquiry of the older men of the tribe, as an Indian is sometimes of mixed blood without knowing it.”
Also, in 1910 it seems it was relevant to know if the wives of a polygamous (Indian) man were sisters, or not. Enough on the 1910 Census.
Generally, I find interesting the choices for languages commonly spoken in the US (as listed as options on the Census), and choices for jobs, and how these have changed over the decades.
In 2010, the Census Bureau cut down the length of the questionnaire, and for 2020 it remains short. You can see the 2020 Census form for yourself. A more detailed list of 72 questions, called the American Community Survey (ACS), is sent to selected households (and has been sent since 2005), in non-decennial years, to allow the Bureau to do statistical sampling. About 3.5 million households are selected to receive the ACS each year.
In 2020, households will have the option of responding online, by mail, or by phone. There are nine questions for “person 1” (and seven are asked for each further member of household). Notable changes for 2020 include new write-in areas under the race question for those who identify as white and/or black (“Irish” and “Somali” are among the provided options). There are also new household relationship categories that allow couples living together to identify their relationships as either “same-sex” or “opposite-sex.”
What about the citizenship question?
I figured you would ask about that. After much back and forth, the citizenship question is NOT going to be on the 2020 Census form. But, it has been included in the past; who knows what will happen in the future.
The last time a citizenship question was among the Census questions for all US households was in 1950, though smaller Census Bureau surveys have included questions about citizenship. Commerce Secretary Wilbur Ross had claimed that the Justice Department needed data from the question to help enforce the Voting Rights Act. Critics pushed back, arguing that adding the question would discourage non-citizens, especially unauthorized immigrants, from participating at all. The Supreme Court ruled in June that the question could not appear on the Census. The court’s opinion stated that the Trump administration had the right to add the question, but the reason it supplied was not compelling.
All this said, the Census Bureau already releases some data on citizenship, gathered in the ACS. But, the ACS annually reaches only about 2.5% of all households, compared with the roughly 16% that received the Census long form.
Further, an Executive Order issued by the President on July 11, 2019 commits the Census Bureau to releasing Citizen Voting-Age Population (CVAP) data by March 31, 2021. The Executive Order states: “Nevertheless, we shall ensure that accurate citizenship data is compiled in connection with the Census by other means.” These data will be produced by combining administrative data from a number of federal, and possibly state, agencies into a separate micro-data file that will contain a “best citizenship” variable for every person in the 2020 Census. Current sources of citizenship data include the Social Security Administration, Housing and Urban Development, Medicare and Medicaid.
Interestingly, the citizenship question—if asked—was predicted to affect apportionment. If it had been on the 2020 Census, the following states would, probably,
|Lose seats||Avoid losing seats||Gain seats|
|California loses 2||Alabama||Montana gains 1|
|NJ loses 1||Minnesota|
|TX gains 2 instead of 3||Ohio|
How much does it cost now? Can we afford it?
The 2010 Census cost \$13 billion, approximately \$42 per capita. To compare, the 2010 Census cost for China was about US\$1 per capita and for India was US\$0.40. A 2019 report predicts that the 2020 Census is now estimated to cost approximately \$15.6 billion. Census funding currently in jeopardy, as is all federal funding (because we are living under a “continuing resolution” which will keep the government running until November 21).
The Census Bureau was established as a permanent agency within the Department of the Interior in 1902. It currently employs about 4,285 permanent staff members, and are in the midst of hiring hundreds of thousands of temporary workers for the 2020 Census. Census 2010 employed 635,000 temporary workers. Certainly, hiring so many is costly.
But, it costs so much for many reasons—one thing I learned from Ron Wasserstein’s podcast is that non-response actually drives up the cost significantly. If you don’t fill yours in, the Census Bureau sends someone to your door to get you to answer. This drives up the cost terrifically. Adding the citizenship question would have (probably) created many, many more who need this door knocking. So, asking the question would have driven up the cost (ironic that President Trump pushed for the citizenship question and simultaneously has consistently requested less money for the Census in his annual budgets?). The lingering toxicity and fear raised by the visibility of that question and legal case—even though question will not appear on the questionnaire—may still drive up cost in this way.
This begs the question…..
Does one have to fill out the Census?
Easy. Yes. There are fines for non-response and for providing false responses. In 1790 the fine was \$20. Today, failure to respond can result in a \$100 fine; providing false answers is a more severe offense, and carries a \$500 fine. Recent news reports that I found by googling, however, indicate that punishment for failure to respond is not usually enforced.
What can you say about the use of technology and the Census?
Oooh, another interesting question. You may be surprised to hear that the 1890 Census was the first to be processed by machine. Punch cards and an electronic tabulator were adapted and developed by Herman Hollerith to speed the tallying of the 1890 Census (punch cards were first developed around 1800 by Joseph Marie Jacquard for the loom, to manufacture textiles at scale and by unskilled workers). Technology developed, and involved UNIVAC I, the TIGER system, early adoption of the computer tape, CD-ROM technology, and the Internet. This history is interesting, and more fully explained at the Census website.
From a Science article by Jeffrey Mervis: “Protecting confidentiality has been a priority for the Census Bureau for most—but not all—of its existence. After the first US Census was conducted in 1790, officials posted the results so that residents could correct errors. But in 1850, the interior secretary decreed that the returns would be kept confidential. They were “not to be used in any way to the gratification of curiosity and Census officials,” or “the exposure of any man’s business or pursuits,” notes an official history of the Census published in 1900. In 1954 the agency’s confidentiality mandate was codified in Title 13 of the U.S. Code.”
This time, the Census bureau will be adopting differential privacy to protect the identity of everyone whose information is contained in data it releases. “Differential privacy addresses the paradox of learning nothing about an individual while learning useful information about a population.” That sounds exactly what we want with Census data (not to mention with medical data, etc.).
Differential privacy protects individuals in a dataset by adding noise. A researcher using the dataset is not able to reverse engineer, to discover the identity of any specific person. Differential privacy was developed in 2006 following the Netflix challenge, which was aimed at improving their movie recommender system. And went wrong. Matthew Francis, in SIAM News, gives a readable and more mathematical description of differential privacy (from which I stole the image to the right).
When will we know the results?
The Census Bureau is expected to announce the new state population counts by December 31, 2020, the deadline for sending the count to the president for the purpose of reapportionment of congressional seats. Further data are released later; some of it publicly available. Two types— “small-area data” and “microdata”—will be available and researchers can use as they wish. Small-area data provides the basic characteristics of residents—age, sex, and race/ethnicity—by Census block. A Census block is the smallest geographic area for which data are reported. There were 11,155,486 blocks in 2010. Blocks cover the entire country, and need not contain inhabitants. Microdata—full information about individuals—are provided for “Public Use Microdata Areas” which contain at least 100,000 people, again, cover the entire country.