Does Big Data Belong in Courtrooms? - Pacific Standard
The criminal justice system has been using predictive algorithms for decades, but research shows even the best algorithms are no better than humans at predicting recidivism—and neither are very good.

Algorithms are behind pretty much everything these days, from music and movie recommendations to GPS navigation to forensic technology. Ever the early adopter of new technologies, the criminal justice system has been using predictive algorithms for decades to guess everything from where crimes might occur to which criminals will appear at court hearings or re-offend. The data-driven tools were supposed to take human error out of the equation in a system plagued by racism and discrimination. But even as these tools have become increasingly sophisticated over the last few years, significant flaws have also come to light. A 2016 ProPublica investigation, for example, found that one common tool for predicting recidivism risk was more likely to falsely identify blacks as high risk and whites as low risk—perpetuating existing disparities within the criminal justice system.

Courtrooms across the country contract with many different for-profit companies, each with their own proprietary algorithms, but the ProPublica investigation looked at one tool in particular, called Correctional Offender Management Profiling for Alternative Sanctions, made by the software company Equivant. When Julia Dressel, then a computer science student at Dartmouth College, and her advisor, Hany Farid, saw the ProPublica investigation they thought, Surely we can do better than this.

"Underlying the whole conversation about algorithms in general was this assumption that algorithmic predictions are inherently superior to human predictions," explains Dressel, now a software engineer at Apple. However, when the Dressel and Farid began digging, they couldn't find any research proving that algorithms were any better than humans at predicting re-offenders. "When you're building algorithms to replace human tasks, it seems reasonable to ask what's the baseline?" Farid says. "How well do humans do?"

"At the very, very least, COMPAS should be outperforming humans who are not trained, who have no criminal justice experience," Dressel adds.

In a new study, the team first set out to find out how COMPAS's accuracy stacked up to humans': Having COMPAS and a group of people assess the same set of defendants, the researchers asked both sets of subjects to predict whether or not the defendants might commit another crime in the next two years. COMPAS uses 137 questions that consider age, sex, criminal history, job status, and family criminal history—but not race—to make predictions. The human participants, who responded to an online survey and presumably have little to no criminal justice experience, were given much less information about the defendants: just age, sex, and criminal history.

Still, the researchers found there were no significant differences in accuracy between the humans, who correctly predicted recidivism 63 percent of the time, and COMPAS, which got it right 65 percent of the time. Both humans and COMPAS were more likely to erroneously predict that black defendants would re-offend than white defendants; the false-positive rate for black defendants was close to 40 percent for both humans and COMPAS, while the false positive rate for white defendants was below 30 percent. Farid calls that accuracy rate, and the asymmetry of the mistakes, "seriously problematic."

But improving the algorithm is trickier than it seems. The team created their own predictive algorithm based on just two data points—age and number of previous convictions—and found that it performed just as well as COMPAS, which uses 137 data points. "That was definitely a shocking result," Dressel says.

"No matter how little data or how much data, how simple or complex the classifiers were, the algorithms couldn't break 65 percent, and neither could our human participants," Farid says. "That's odd because usually, as you add more data to a problem, your accuracy should go up, and when it doesn't it suggests, but does not prove, that perhaps there's no signal here." In other words, it could mean that about 65 percent accuracy is as good as it gets when it comes to predicting the future behavior of defendants.

That doesn't mean these predictive algorithms should be abandoned, according to the authors, but rather that we may need to reconsider how much weight we give their predictions in court.

"Imagine you're a judge, and someone says, 'We have big data, and it has made a prediction.' Now imagine someone says, 'Well, I asked 10 people online what they thought,'" Farid says. "We are trying to say to the court, 'When you see this prediction from this software or similar software, you should know that it is as accurate as people without any criminal justice experience responding to an online survey, and you should weight that risk assessment proportionately.'"

The trouble is, while we know that COMPAS and other software influence judges' decisions over matters as critical as bail (a hugely important issue, as those who aren't released on bail while awaiting trial often lose their jobs and their homes), we don't know exactly how these algorithms work. The makers of COMPAS and other similar software are opposed to revealing the inner workings of their algorithms. While good for business, that decision leaves defendants at a disadvantage. "A defendant can't really argue with the COMPAS predictions in a courtroom because they don't know how these predictions came about," Dressel says. "It's really important for any company to be transparent about how their software works and to provide research, or proof, that their algorithms are actually effective and accurate, because the decisions that these algorithms are making can have a really profound impact on someone's life."

It's not just an issue of trusting a company with a financial stake in a tool to evaluate its performance. Software codes can have unintentional errors, which independent evaluators might be able to spot. Transparency doesn't necessarily mean the company has to post its code on GitHub, Farid says. The courts could have outside experts sign non-disclosure agreements before evaluating their product. Farid does it all the time as an expert witness in patent cases.

"As a technologist, I am sympathetic to intellectual property and competitive edges," he says. However, "I think if you're going to unleash this type of technology on the public, we should be insisting on more accurate and more transparent technology."

Related