“Unlike his opponent ... [Union Gen. Samuel R.] Curtis had not been satisfied to report his casualties in round figures. That would have been neither respectful to the dead nor indicative of sound administration.”
—Shelby Foote, The Civil War: A Narrative
How many people have died in the Syrian civil war? The answer to that question affects world policy—a high number impels outsiders to intervene, a low number generates tut-tutting and but little action. It also affects the aftermath—who gets put against the wall by the victors, or in a kinder outcome, how many narratives of loss does the truth and reconciliation commission hear.
According to the United Nations, the number likely has reached the psychologically important roundness of 100,000. Most of them, like the more than 6,000 children killed, probably weren’t actively fighting for any side.
“The public announcement of an impressively large sounding number, regardless of its origins or validity, can generate prominent press coverage, which in turn legitimates and perpetuates the use of the number,” as Peter Andreas and Kelly M. Greenhill wrote in the introduction to their 2010 book, Sex, Lies, and Body Counts. Even if for many in unaffected areas the figure is just a tidbit in their daily news consumption, those numbers, whether used as parliamentary bludgeons or lures, count for political leaders, NGOs, historians, and even public health officials.
And yet, as Andreas and Greenhill stated, “Statistics—both good and bad—are often uncritically accepted and reproduced because they are assumed to have been generated by experts who possess specialized knowledge and know what they are doing.”
In the case of Syria, and in fact in most the globe’s violent regions, there are indeed experts attempting to produce good statistics from the chaos, marshaling the smartest survey techniques they can in the conflict zone and the latest techniques in making sense of that raw data. The U.N., for instance, is using numbers that have been analyzed by the Human Rights Data Analysis Group (HRDAG), a San Francisco-based non-profit “that applies rigorous science to the analysis of human rights violations around the world.”
The group, led by Patrick Ball, has been working under various parents for almost two decades, originally the American Association for the Advancement of Science and then the non-profit Benetech before going it alone in February. In many ways its members have written the book—or least some key chapters—on counting civilian casualties. The HRDAG website lists 15 different projects around the world, from hot zones like Syria to cooling ones like East Timor and Guatemala.
HRDAG strives to be non-political. “We don’t take political sides in a conflict," explained Megan Price, their director of research, “but we’re always on the side of human rights.” That shows up in their critics, she added. “The majority of critics criticize us for our science, which is what we prefer. That’s how science works and that’s how science improves.”
Unsatisfied with an earlier estimation of the dead in Syria—they were certain their own analysis offered too low a figure—HRDAG has partnered with the Center for Human Rights Science at Carnegie-Mellon University to develop new statistical techniques to get even more accurate counts in this conflict and in future ones.
While organized militaries can usually provide pretty detailed and precise breakdowns of their losses—and are dab hands at estimating those of their uniformed foes—civilian casualties have always been harder to estimate. Some 2,403 Americans died at Pearl Harbor (68, by the way, were civilians), for example, while the number of almost entirely civilian Germans who died in the Dresden firebombing was roughly 25,000—or maybe it was 35,000 or even 100,000. Thanks to the extra chaos of a civil conflict, with battlelines at every corner, the task of counting is even less straightforward, especially when perpetrators may hide their deeds, or killings occur outside the limelight.
The latest U.N. report prepared by HRDAG is considered an “enumeration,” which the authors stress “is not the complete number of conflict-related killings” in Syria, but a count of the documented killings—i.e. those where the identity of the victim, along with the date and place of their death, is known. (That helps navigate around concerns that can arise by using sampling to get a count.) Because multiple data sources are used, the enumeration itself may be a little high, even while it’s almost certainly too low as an accounting of all the dead.
Specifically addressing a February count which this latest report updates, Price said the organization “definitely think[s] the number is too low, but we don’t take that as any criticism of the data-gathering process. My gosh, collecting data in the midst of a conflict is difficult and these groups are doing amazing work. It’s more out of our experience analyzing data that comes out of conflicts that, almost by definition, there’s always going to violence that someone wants to hide.”
It’s not just the perps who may want to hide, she added. Sometimes witnesses or survivors aren’t comfortable relating their tales, assume no one wants to hear their story, or see no advantage in talking.
HRDAG is not collecting raw data in Syria itself. It taps eight different sources of casualty information—including the Syrian government, the opposition, and a variety of groups with names like the Syrian Center for Statistics and Research and the Violation Documentation Center—that each creates its own lists of observed deaths and carries its own set of peculiarities.
“Each group collecting information might have a different definition for what constitutes conflict-related death,” Price offered. They also range on the spectrum of how clear and transparent each is about how they code their findings and how much of the sausage-making they’re willing to share. (The groups also post much of their finished data online, while individual Syrians share accounts and obituaries through social media, setting the stage for an information free-for-all that provides rich new veins of data—and headaches.)
While HRDAG will continue to enumerate the war deaths in Syria—an average of 5,000 a month at the current intensity—at the same time it’s working toward a full accounting of the tragedy. In short, they are figuring out what’s missing from the record.
There are parallels to the U.S. Census. It features a lot of easily obtained information from people who fill out their forms promptly and accurately, which in turn gives a decent approximation of the number of Americans. Yet noise such as duplications or omissions inevitably creeps into the signal, noise the bureaucrats are particularly keen to expunge. Plus there’s missing data—from the homeless, people like undocumented migrants who avoid any brushes with officialdom, those who moved at just the wrong time.
To arrive at a true number, or at least a truer one, top-flight statisticians—like Carnegie-Mellon’s Stephen Fienberg—have been honing Census data after it’s collected since 1950. And now Fienberg is heading CMU’s contribution to the Syrian project.
“Everything in data analysis is an approximation,” Fienberg said. “The question is, how good is it?” Given good information in, evolving techniques can make it very good information out. “My principal contribution here is bringing new methodology to bear, looking with real care at the implications of what’s been done so far, trying to understand the structure of the data and potential problems with the data, and how that might propagate into any effort to estimate the number of killings that have been missed. That’s a crucial component in these settings.”
The base method being used in Syria (and in the Census) is known as capture-recapture or multiple recapture, an area Fienberg has been working with for more than four decades. Ironically, capture-recapture is usually associated with figuring out how many beings are alive, not dead. The traditional example comes from counting wildlife: pull some fish out of a pond and tag them before returning them to the water. Then do it again, this time looking to see how many have tags. From that initial count and the overlap you can estimate what the total fish population is. That the process is also dubbed “multiple systems estimation” correctly suggests that the process and estimations can be a tad more complex. HRDAG’s Anita Gohdes has written that the number of lists they’re working with create the possibilities of 32 overlaps, but not all those overlaps are possible over the full length of the conflict (when did it officially start?) or the breadth of the country.
Plus, Fienberg warned, “Every application has its own special features that you ignore at your peril.”
Given the diversity of sources listing the dead in Syria—the loyalists, the rebels, and people both in-between and above it all, “a big piece that our team worries about so much is collection bias, this idea of what’s getting documented versus what isn’t," Price said, "The beauty of capture-recapture is it makes it possible to model those potential biases and adjust for them.”
One concern in the Syrian project is more quotidian—assuming you can actually match up names and files on different lists. ”If you have small lists, you do it by eye,” Fienberg related. “You read the records and you say, ‘Aha!’ You read off somebody’s name, their address, the date that they were killed, and where they were killed, and you locate that on two different lists, and you say, these records go together. But if lists get larger and larger, that’s not feasible, and you have to actually do things using one or more different algorithms.” Just as flawed data from the scene creates follow-on concerns, the accuracy of the algorithm and the uncertainties it creates affects the ability to analyze the linked data. “And those are questions that we’re really focused on,” said Fienberg.
That focus comes at a price, and in this case the currency is time. The U.N. enumeration released this week covered the period between March 2011 and April 2013, a relatively speedy two-month lag for good numbers. But the Fienberg-led re-evaluation will take much longer, in part because the innovative models and algorithms it promises need to be created.
Given the strenuous efforts this attention to accuracy entails, wouldn’t a good-enough casualty count suffice? Price approached that question with another question, one that HRDAG asks before taking on a new project. “Does the truth matter? Of course, in the broad, philosophical sense, you always want to answer to that question to be yes,” Price said, “But what we mean on our team when we ask that is ... in this particular conflict, is that more rigorous scientific analysis going to serve some function. Is there some possibility of change after quantitative analysis?
“We want to provide policymakers with the best, most defensible rigorous scientific analysis possible. It’s really up to those policymakers and other advocates to give those numbers meaning and the life and the interpretation that context gives quantitative analysis.”