Skip to main content

Geography, Race, and LOLs

The online lexicon spreads through racial and ethnic groups as much as it does through geography and other traditional linguistic measures.
(Photo: nnova/Flickr)

(Photo: nnova/Flickr)

Everyone who has driven across the United States or even across a state knows that not everybody speaks the same way. There are regional dialects and slang, and within regions, there are demographic differences as well. Something similar is true of emoticons, abbreviations, and other forms of online shorthand, except that online, race plays an outsized role compared with other more traditional measures like geographical distance.

With the global reach of the Internet and communication systems like Twitter, you might reasonably think that at least some of its neologisms would have similar global reach. And you'd be right in many cases. Others you might not be so familiar with. The abbreviation ikr—short for "I know, right?"—is much more common in Detroit than other parts of the U.S., and that's just one of many examples. Exactly what determines how such words spread, however, is an open question.

The abbreviation ikr—short for "I know, right?"—is much more common in Detroit than other parts of the U.S.

So Jacob Eisenstein and colleagues at Georgia Tech, University of Massachusetts-Amherst, and Carnegie Mellon University decided to collect three years of Twitter data—107 million tweets sent by 2.7 million users between 2009 and 2012, complete with the users' locations—to see what kinds of patterns they could find. They focused on 2,603 words which vary from the obvious ("crzy") to the more cryptic ("ion," meaning "I don't," as in "ion even care") and beyond (";3," which apparently has something to do with anime cats).

But Eisenstein and company weren't interested in merely cataloging Twitter dialects. Instead, they built a mathematical model that describes how words like "ikr," "ion," and ";3" spread between different parts of the United States over time. Using that model, they constructed a network of connections between cities—technically, Metropolitan Statistical Areas—and, finally, looked at demographic and geographic factors that could explain those connections.

The single most important predictor of whether words would spread between two cities was the difference in the percentage of African Americans living there. That was closely followed by the percentage difference in Hispanics. More traditional predictors of the spread of a lexicon, such as geography, played important but somewhat smaller roles. For example, Boston and Seattle share similar demographics, as do Los Angeles and Miami and Washington, D.C., and New Orleans, and those pairs of cities were quite likely to share similar Twitter lexicons despite thousands of miles between them.

The particularly strong effects of racial similarity between cities, the authors write in PLoS One, suggests that at least one pattern prevalent in the physical world is retained—perhaps even amplified—in the online world. "In spoken language, African American English differs more substantially from other American varieties than any regional dialect; our analysis suggests that such differences persist in the virtual and disembodied realm of social media."