Skip to main content

Weighing the Pros and Cons of a More Technology-Dependent Census

The 2020 census will rely on more digital innovation than ever, which is creating both concerns and opportunities.
Census workers inform ethnic Russians of the upcoming census count in the Russian enclave of Brighton Beach on March 7th, 2010, in the Brooklyn borough of New York.

The 2010 census found that the population of the United States had reached 308 million people. But it managed to miss a whole lot of others: about 2.1 percent of black Americans and 1.5 percent of Hispanics nationally, together accounting for some 1.5 million people. Young children were among the most undercounted.

Now the U.S. is gearing up for another head count, one that will include some big changes. For one thing, the 2020 census will be the first to use the Internet as the primary (and preferred) means of collecting household data. You'll still receive a postcard in the mail, but you'll be asked to go online to submit your responses to the short questionnaire, including questions such as the number of people residing at the address and their race. This change could compound some of the problems we saw in 2010, particularly when it comes to including members of minority and poorer communities who may not have Internet access. The fact that the 2020 census will for the first time include a question about citizenship status may increase unresponsiveness by some of the same communities.

But technology could also play a role in solving a dual problem that has long vexed the national head count: knowing where to send those questionnaires in the first place and how to ensure that every household responds.

The census has traditionally derived its data from past surveys. It also cross-checked addresses with local governments through a program known as the Local Update of Census Addresses and used census workers to physically canvas neighborhoods. Despite these measures, the 2010 census seemed to have missed a substantial number of households in some places. In 2011, for example, New York City documented a shortfall of at least 50,000 residents, the result of erroneously assuming a number of housing units to be vacant in selected areas of the city.

Much of the undercounting is due to the invisibility of certain types of addresses, including those resulting from illegal subdivision of existing buildings, new construction, or unusual living arrangements. Think those living unofficially in the basements of commercial buildings, apartments shared by several families due to high real estate costs though only one family is on the lease, or homeless people shuttling between friends' couches and homeless shelters. The Census Bureau has admitted to missing more "ethnic and racial minorities disproportionately liv[ing] in hard-to-count circumstances." Because respondents need to have an address recognized by the Census Bureau, vulnerable populations living in unofficial housing are more likely to be missed.

In addition to traditional methods of address list verification, the 2020 census will try to cut down on missing addresses by cross-checking its address lists with digitally available federal "administrative records" such as Social Security records, tax data from the Internal Revenue Service, and records from programs like the Supplemental Nutrition Assistance Program, Temporary Assistance to Needy Families, and the Special Supplemental Nutrition Program for Women, Infants, and Children. While the cross-check is a positive step, according to the Urban Institute, there is still concern that some vulnerable and hard-to-reach subpopulations "may not have the same body or quality of administrative records as other groups." Immigrant groups, Native Americans, and children of color are among the groups traditionally undercounted and also underrepresented in federal records.

That brings us to the second challenge: non-responsiveness. The Census Bureau cannot force someone to fill out the questionnaire, but it does try to urge households to answer by using both multiple mailings and in-person visits to addresses believed to be occupied. The trick is that workers assigned to check on unresponsive households in particular neighborhoods will not attempt to make contact if the census' systems indicate that the address is vacant. Currently, the only way that the census double-checks whether unresponsive addresses are indeed vacant is by checking against the "administrative records" databases, which themselves are more likely to miss certain populations living in unusual situations.

Local knowledge of neighborhoods can help add new or unusual addresses and prompt in-person attempts at reaching those residents. Some local governments like New York City have been active in verifying addresses and investing resources in ensuring a fair count, but other local governments may not be in a position to do so. The opportunity is to leverage a variety of local community records to improve verification; the challenge is how to do so efficiently and transparently.

Local groups can help the census conduct a more concerted and accurate count by helping them add previously unenumerated addresses and determining whether households are truly vacant or simply not responding. The Census Bureau could allow trusted local organizations with a high degree of knowledge of the community—civil rights groups, community organizations, local schools, and even religious institutions—in real time to monitor response rates in their neighborhoods. Local groups are likely to have "street level" knowledge as to whether census workers have missed an unconventional address or families are doubling up. They could even look at, say, whether increased school attendance suggests that a community is growing and thus could use a more robust effort at following up with non-responses.

And that's where technology comes in. One option is to use a database that accepts responses from carefully vetted local sources. Such databases could even be piloted to focus on areas at risk of undercounting. For example, local schools might review a list of unresponsive addresses in the neighborhood to see whether they know that some of the students do, in fact, live in those seemingly empty addresses. Similarly schools could also add any known addresses that do not appear on conventional lists or even whether several families appear to be doubling up at a single address.

A newer option is to use a permissioned blockchain platform rather than a single database. A blockchain is essentially a technology platform that keeps track of multiple independent validations of transactions without relying on a single authority. If we think of a census entry as the beginning of a "block," the local groups can "validate" that it's accurate by accepting the block, or can fail to accept it, thus prompting a reverification of the original information.

In the census case, local community groups could be given permission to validate information such as "one family at XXX address," "YYY address is vacant," or even the number of household members. The more entities that validate the census notation, the greater the mathematical probability (and the public confidence) that everyone was accurately counted.

Conversely, a lack of validations could signal that something needs to be re-checked—an apparent vacancy might actually have residents, or a particular household may have more members than previously believed. The upshot is focusing census workers' efforts at double-checking an address or obtaining a response.

A pattern of non-validations in a particular neighborhood can highlight a more systemic problem, prompting an effort to address it as part of the normal census verification process. Armed with such visibility, we may be better able to fix mistakes before the count is completed, rather than rely on post-census challenges that are difficult to win. While blockchain-based census verification hasn't yet been broadly tested, it holds promise for supporting a more accurate count, and pilots could be run even in time for 2020.

Why is all of this ultimately so important? The census count determines the number of congressional seats. In 2010, 10 states lost congressional seats, and estimates based on that census predict that another undercount could jeopardize the number of congressional seats representing states including New York, Texas, Arizona, and North Carolina. An undercount also jeopardizes the amount of federal benefits going to a locale. This time, almost $600 billion of federal funding is at stake.

The 2020 census is more technology dependent than ever, raising both concerns and opportunities. The time to prepare is now by supporting access for digitally disenfranchised communities; ensuring sensitive data, particularly citizenship status, is not inappropriately shared; and leveraging technology for better real-time accuracy tracking that can help minimize the undercounting that has always been a problem for poorer, minority communities.

This story originally appeared in New America's digital magazine, New America Weekly, a Pacific Standard partner site. Sign up to get New America Weekly delivered to your inbox, and follow @NewAmerica on Twitter.