Bias in the Court

How biased are forensic psychologists by the legal team that picks them? More than they think.
Publish date:
Social count:
How biased are forensic psychologists by the legal team that picks them? More than they think.


On November 14, 1978, a Texas jury found Thomas Barefoot guilty of the murder of Bell County police officer Carl Levin. Based on the gravity of the crime and the testimony of two psychiatrists who claimed that Barefoot would pose a continued menace to society, that same jury recommended the death penalty.

Barefoot appealed. The psychiatrists, he argued, had no grounds on which to predict his future dangerousness. The case made it all the way to the Supreme Court, which rejected his claim, affirming the merit of the mental health experts and denying a stay of execution. On October 30, 1984, Barefoot was put to death by lethal injection.

That same year, another man, Leroy Hendricks, was convicted in Kansas for his own crimes—child molestation, in this case. He went on to serve nearly 10 years of his five- to 20-year sentence and was almost a free man when the state government stepped in and—again on the basis of a mental health expert’s opinion—moved to contain him under its Sexually Violent Predator Act, a law passed only two years earlier that allowed the state to indefinitely confine sexual predators it considered too dangerous to release, placing them instead into mental hospitals after they’d served their sentences. It was enacted after the rape and murder of a young college student at the hands of a convicted rapist out after his release. Hendricks, in fact, would be the Act’s first case.

“It’s brought to the forefront things that are no surprise to many forensic psychologists and psychiatrists, which is that we’re human beings, and human beings have biases.”

Forensic psychologists and psychiatrists are frequent elements in criminal proceedings, and their opinions clearly carry weight. But these experts are often recruited by a particular side—defense or prosecution—and paid for their time and expertise. So here’s the too-little-asked question: Are forensic mental health experts influenced by the team or individual that picks them?

That’s the question asked by a team of researchers out of the University of Virginia and Sam Houston State University in Texas. And according to their new study out in the journal Psychological Science, not only is the answer a clear yes, but it may be that most evaluators don’t even realize when their work is compromised.

“Sometimes the public or attorneys talk about hired guns, but nobody thinks they’re the hired gun, right?” says lead author Daniel Murrie, who is also the director of the University of Virginia’s Institute of Law, Psychiatry, and Public Policy. “But what this research seems to suggest is that many clinicians are vulnerable to some degree after all.”

The work that led up to this conclusion kicked off a few years ago, when notices from Murrie’s institute started popping up in the inboxes of forensic psychologists and psychiatrists around the country. They were invited, the letters read, to participate in free training on a couple of popular risk assessment tools—the Psychopathy Checklist-Revised, or PCL-R, and the Static-99—for a few continuing education credits if they would come back in a few weeks’ time to score case files for an out-of-state “justice system” for $100 per file.

Granted, it wasn’t standard as consultation gigs go, but it wasn’t unheard of either. One hundred and eighteen Ph.D.s, Psy.D.s, and M.D.s signed on, streaming in from 15 states. Three weeks later, all but 10 returned to meet with the attorney, who explained that a large-scale legal case was underway and that they would be scoring a group of sex offenders being considered for long-term civil commitment. Armed with case files and the evaluation forms they studied earlier, each scored three to four individuals before heading home.

Only four of the 108 participants guessed the truth: The entire affair was a lie. Everyone there actually scored the same three main files—real cases from the past, stripped of identifying information. (Some also scored a final case very low in psychopathy, just to see if there was a possible floor effect. There was, to an extent.) The office space was a borrowed university satellite building. The $400 came from a National Science Foundation grant. And as for the attorney? The retired lawyer-turned-professor was the key to the whole thing—to half of the participants, he claimed to be with the prosecution; to the other half, he was the defense; and to both, he gave 15 minutes’ worth of team-tailored pep talk, and disappeared. As manipulations go, it wasn’t very strong. But it was enough.

On all three main cases, average prosecution scores on the PCL-R were about three points higher than the defense. The Static-99 varied similarly but to a lesser extent. That in itself is odd; normal error wouldn’t stack the deck in just one direction. But then, to put the effect into perspective, the researchers compared all the possible combinations of a defense expert versus prosecution—akin to the typical expert-versus-expert courtroom scenario. In any given pairing, they found, the prosecution scored more than six points higher than the defense nearly a full third of the time.

Statistically speaking, says Marcus Boccaccini, a psychologist at Sam Houston State University and study co-author, a six-point gap with the state higher than defense should happen in fewer than three percent of the pairings with normal variation. “Not 29 percent. But it is 29 percent, and it’s happening in that direction.” A similar pattern emerged in the Static-99, albeit to a lesser extent, with state experts varying more than two standard errors higher than defense in 18 to 20 percent of the cases. And for neither instrument did the differences correlate with the experts’ experience, specialty, and attitude toward sex offenders—only the team they played for.

The thing to keep in mind is that these measures are two of the most common and thoroughly researched risk assessment tools in psychology today. And they’re common because they’re considered so objective. In research settings, two experts scoring the same person for psychopathy on the PCL-R will produce scores within two points of each other 92 percent of the time.

“This is a watershed piece of research,” says Joel Dvoskin, a longtime forensic psychologist in Arizona. “It’s brought to the forefront things that are no surprise to many forensic psychologists and psychiatrists, which is that we’re human beings, and human beings have biases.”

Indeed, many have long assumed that bias in mental health testimony existed, particularly since the rest of forensic science, like fingerprint analysis, DNA, and eyewitness testimony, has come under fire for subjectivity in recent years. But without numbers and proof, it remained an open question easy to dismiss. “We have an uncanny ability to say it’s a problem, but it’s not a problem for me,” says Randy Otto, a forensic psychologist at the University of South Florida. “And they truly believe that the opinions they form are unaffected by how they’ve been retained and who’s retaining them.”

In fact, after the study was complete, Murrie and Boccaccini sent the scores to the participants so that each could see how they compared—yet still the blinders stayed. “And in fact,” says Murrie, “if you asked them, ‘Who do you think is most vulnerable to allegiance bias?,’ you would see the answers were typically opposite of them. People who worked in the public sector all thought that the people who would be vulnerable to allegiance would be private practitioners.” The older experts blamed the younger, and newcomers blamed the experienced. “Basically everybody thought it was somebody who was unlike them.”

It’s unknown how this adversarial allegiance impacts real life. The standard danger zone in a PCL-R rating is 30 or higher, so six points matter—but no mental health evaluation exists in a courtroom vacuum, confounding cause and effect. Some research has suggested that a defendant labeled as psychopathic is seen as more deserving of harsh punishment. Certainly, such score discrepancies do exist in real life: Earlier work by Murrie and Boccaccini has documented lopsided results in actual court proceedings in Texas, and other work by them suggests that jurors in sexual predator cases doubt the testimony of defense experts over the state’s.

Future research can try to tease out such effects, as well as ways to counteract them and force experts to recognize their own shortcomings. “Whether it’s choosing the right test or looking to see if evaluators trained a certain way are less prone to this than others—we need to start thinking about moving in that direction,” says Boccaccini. “We now see the problem, well, let’s work to minimize it. We can do that. We’re people who study humans and human decision-making. We should rise to the challenge.”

For Murrie and Boccaccini, that may be easier said than done. “We’re pretty sure they’ll never believe us again,” Boccaccini says. “Even if we try to do a follow-up study where there’s no deception, nobody’s going to believe it.”