All the Dead We Cannot See

How a tech geek is using machine learning to hold human rights abusers accountable.
Publish date:
Ixil indigenous women attend a hearing during the genocide trial of former dictator Efrain Rios Montt in Guatemala City on August 25th, 2015.

Ixil indigenous women attend a hearing during the genocide trial of former dictator Efrain Rios Montt in Guatemala City on August 25th, 2015.

In August of 2011, Patrick Ball sat alone in an empty storefront with the blinds drawn, counting the dead. Their names ran across his laptop screen in ghostly green type: Jacinto Ramirez, male, born in 1934, killed in 1988 by the army, an indigenous Ixil Mayan. Pedro Caba, also Ixil and male, born in 1933, disappeared in 1981 by guerrillas. Tens of thousands of names, all Guatemalans reported dead in the country's 36-year civil war. Ball was there to divine what wasn't on the list: How many bodies hadn't been counted? And was there a pattern to the violence?

To crack the case, he'd spent 70 hours a week for the past two months coding in a vacant building on San Francisco's Mission Street, once home to a bridal shop and porn studio. (The Electronic Frontier Foundation, where his long-term partner, Cindy Cohn, was legal director, had bought the building but not occupied it.) In his mid-forties, Ball was stout, with a nose and beard that would qualify him to play Santa. He sat at a folding table amid heaps of stripped wallpaper, tapping the keys like a piano player.

Ball had started with various lists of conflict-related deaths from the Guatemalan government, the Catholic Church, a United Nations truth commission, and a local non-governmental organization. He spent weeks training a machine-learning model to accurately recognize the same person across the four lists (even if the "Perez" in one was the "Peres" in another), a process called record linkage. He then used a technique called multiple imputation to account for records missing critical information, such as the victim's ethnicity or the perpetrator's affiliation. This involved breaking up a single record into fractions and essentially rolling heavily weighted dice, based on the person's other known characteristics, to fill in the blanks. Finally, Ball applied a statistical method called multiple systems estimation, typically used for wildlife counts, to estimate how many names were missing from all four lists and whether certain groups were systemically excluded. By comparing the number of overlapping records to the number that were unique, as in a Venn diagram, he could estimate the total number of deaths, including those that were unobserved. The result: By Ball's estimate, no one had recorded a quarter of all deaths in Guatemala's Ixil region between April of 1982 and July of 1983.

Next, he had to put all the pieces together. He created a file called "make-magic-numbers.R," an ironic in-joke, since his process was anything but magic. He typed inscrutable words on the black screen—"prettyNum," "round," "nhats"—mumbling absently to the machine as he worked. Combining all the chunks of code he'd crafted over weeks, he finally saw the number he'd been chasing: 7.92. That meant you were about eight times more likely to be killed by the army in the Ixil region during that period in the early 1980s if you were indigenous. As survivors had insisted for nearly 30 years, the violence wasn't random; the pattern pointed to genocide. And Ball proved it without leaving his computer.


Counting the victims of the world's most atrocious violence is hard. Most human rights violations happen in the dark. Even if witnesses remain, they can be unreliable or too afraid to talk. When government actors are at fault, they can cover up their actions or promote alternate storylines. As a result, even well-intentioned attempts to document abuses can lead to data riddled with dangerous gaps. And without evidence, perpetrators can deny what happened and evade prosecution.

Ball, a statistician, has spent the last two decades finding ways to make the silence speak. He helped pioneer the use of formal statistical modeling, and, later, machine learning—tools more often used for e-commerce or digital marketing—to measure human rights violations that weren't recorded. In Guatemala, his analysis helped convict former dictator General Efraín Ríos Montt of genocide in 2013. It was the first time a former head of state was found guilty of the crime in his own country.

"One could take testimonies and forensic evidence and say these terrible crimes occurred," says Elizabeth Oglesby, an associate professor at the University of Arizona and another expert witness in the trial. "But Patrick's analysis proved that the army intended to kill Mayans—that's the key to being able to prove genocide."

Ball has testified in four other human rights trials and advised nine truth commissions, four U.N. missions, and dozens of advocacy groups in more than 30 countries. The San Francisco-based non-profit he co-founded, the Human Rights Data Analysis Group, has trained and advised a cadre of researchers implementing similar techniques at universities and advocacy groups.

Patrick Ball.

Patrick Ball.

"HRDAG represents, without a doubt, the world's top people doing statistical work on human rights violations," says Brian Root, senior quantitative analyst at Human Rights Watch, who considers Ball a mentor. "They were the first in human rights to take the data that's known and figure out what's unknown—the dark data."

For Ball, instances of flawed death statistics abound, from journalists "mixing guesses together" to tally casualties during the Vietnam War, to Iraq Body Count, which draws primarily on press reports to chronicle civilian deaths related to the 2003 United States-led military intervention. According to Ball, that database fails to include many smaller-scale violent events, which has led analysts to not only underestimate the true death toll but overlook attacks by Shia militias on Sunni Iraqis, some of whom went on to support ISIS. "Statistics could've illuminated this problem in real time, but they didn't. Instead, by treating the database as comprehensive, some observers accepted the naive interpretation, and we blew it," he says. A more recent example, which Ball described in a 2016 Granta article, involves attempts to quantify police killings in the U.S. The Guardian's crowdsourced effort, for example, concluded that officers killed fewer than 1,150 people annually in recent years, while the Bureau of Justice Statistics calculated around 930 a year. Ball and his colleagues, after applying statistical techniques, think the number is closer to 1,500, or 8 to 10 percent of all homicides.

As new technologies have made it easier to collect vast amounts of data, our urge to quantify things has only gotten stronger. But incomplete estimates can be dangerous, Ball warns, since fancy graphs create false confidence in the wrong conclusion. The data that does exist about violence isn't necessarily a representative sample—we don't know what we don't know. "It's not a little boo-boo," Ball says. "We're fucking wrong! And we're talking about the worst things that ever happened to these people—we can't afford to be wrong."


One Wednesday afternoon, I met Ball in the HRDAG's office, located above a hair salon on a quieter street, a block away from his 2011 hideout. He now works out of a corner of the one-room suite, flanked by ferns and a whiteboard bedecked with equations. That day, Ball wore a fleece-lined hoodie, an Innocence Project T-shirt, and hiking boots. I wasn't surprised when he confessed to being a Burning Man devotee, like many Bay Area tech nerds. (Three years ago, as a 50th birthday present to himself, he built an art installation called the Magic Mirror that echoes your dance movements with multicolored lights, like a smart lava lamp).

Ball introduced me to a young computer engineer crunching numbers at a machine, and to Eleanor, a computer server named after Eleanor Roosevelt, who championed the Universal Declaration of Human Rights. Then he pointed to a small flyer mounted near his desk: "This is the thing that got me going in my career."

It featured a black-and-white headshot of Juan Orlando Zepeda, a mustachioed former general in mid-salute. Below was a tally of the unsavory accomplishments he oversaw: arbitrary executions (210), illegal detention (110), torture (64). The list went on, with 310 victims in all. In 1991, on a break from his sociology Ph.D. at the University of Michigan, Ball ended up working with a human rights group in El Salvador to build a database of alleged abuses by military officers during the country's civil war. Flyers like this one outing top perpetrators papered the capital, and the worst 100 offenders were forced to resign during the country's peace process. "I was like, 'Wow, that was really cool.' I was really proud of it," Ball says. He expected to return to an academic career, but human rights groups kept calling, impressed with his ability to turn narrative testimonies into quantitative data.

Ball pointed out other posters lining the walls, mementos of his team's projects around the world. In Guatemala, his testimony helped convict Colonel Héctor Bol de la Cruz, the former national police chief, in the kidnapping and disappearance of a student labor activist. "He's the only guy who scares me," Ball says. At a war crimes tribunal for the former Yugoslavia, he presented statistical evidence that suggested Slobodan Milošević's forces were most likely responsible for systematically killing Albanians in Kosovo and forcing them from their homes (the ex-dictator died before a verdict was issued). Ball's testimony helped convict Chad's former president, Hissène Habré, of crimes against humanity, war crimes, and torture in 2016. He has even deployed an algorithm to ferret out mass graves in Mexico.

"This one might be my favorite," Ball says, pausing in front of a chilling black-and-white photo of a prison cell in Chad. On the wall, someone had scrawled a poem in child-like letters: L'homme est né pour la mort et pour la souffrance. Man is born for death and for suffering.


I checked in with Ball again after General Ríos Montt died in April of 2018. He had recently recovered from a surgery to repair a ruptured tendon in his right arm. "It's what you get for working every waking hour for 30 years," he told me.

Guatemala's highest court overturned the genocide verdict on a technicality, and Ball returned in March to testify again in a retrial. But as in Milošević's case, the gears of justice turned too slowly.

I asked what it felt like to work so hard to reveal the facts, only to have a perpetrator evade accountability. "That blows," he said. "You have to enjoy the pursuit because the moments of reward are so few and improbable and fragile. But what are we going to do with our lives? Help people serve ads to sell sneakers?"

A woman walks near a common grave during the burial of 25 unidentified or unclaimed corpses at the San Rafael municipal pantheon in Ciudad Juarez, Chihuahua state, Mexico, on November 28th, 2018.

A woman walks near a common grave during the burial of 25 unidentified or unclaimed corpses at the San Rafael municipal pantheon in Ciudad Juarez, Chihuahua state, Mexico, on November 28th, 2018.

Mapping Mexico's Hidden Graves—With an Algorithm

At least 37,000 people have vanished in Mexico since 2007, according to a Mexican government database, after officials there declared a "war on drugs." Officials have discovered the remains of some 2,000 people in more than 1,000 clandestine graves. But where are the missing bodies, the unmapped graves?

"It's hard to remember that the blank spots in the map are because you just don't have any data," Ball says. "Why don't we have any reports from those places? The answer is the people who do the reporting risk getting killed."

Last year, Ball decided to fill in the blanks. With colleagues from Data Cívica, a non-profit, and the Human Rights Program at Ibero-American University, both in Mexico City, he developed a machine-learning model to predict the location of hidden graves. They started with data on existing graves from press reports and local prosecutors, plus a host of seemingly unrelated variables about the country's municipalities, such as miles of paved highway and proportion of teenagers attending school. Applying an algorithm called "random forest," they determined the places most likely to contain secret burial sites. Based on data from 2016, they identified 13 municipalities that had at least a 65 percent chance of harboring graves, but where neither the press nor local officials had reported any. These included Apatzingán in the state of Michoacán, González and Altamira in Tamaulipas, and Pueblo Nuevo in Durango.

The locations weren't shocking—they were known centers of violence—but Ball says singling them out with technology created a powerful advocacy tool. "People know where the violence is, but nobody can say it openly," he says. "There's no witness you can kill to silence a machine-learning model."

Relatives of disappeared people in one border state have used the data to convince local prosecutors to investigate. His partners have presented the model to authorities leading Mexico's newly created National Search Commission and the team plans to publish updated findings early next year. Eventually, they hope to refine the model with more granular data, such as the type of terrain where bodies have been discovered or their distance from roads and rivers, so it can lead people closer to specific gravesites.