A Fix for Social Science

Drawing on experience, one pair of young social scientists offer a way to cut down on mistakes in their field.

The public’s intense focus on mistakes in social science can make for an intimidating atmosphere for those in the field. “Researchers facing that situation are turning left and right and saying: ‘So, what’s my response to that? Maybe I should not conduct original research anymore, or maybe I should just do the boring stuff that doesn’t get cited in the media,'” says Raphael Silberzahn, an assistant professor at the IESE Business School in Barcelona. “It sounds awful, but if a single paper could ruin your career, then why go for the flashy results?”

Silberzahn and a few other researchers are finishing work on a project that aims to bring a little more security and reproducibility to the social sciences. While Silberzahn and his peers are still working on a paper about their results, in the meantime he’s co-authored an essay describing the project, which appears today in the journal Nature. At a time when a lot of projects have pointed out problems in social science, Silberzahn and his colleagues offer one potential solution: The essay authors asked 29 research teams from around the world to analyze, in whatever way they thought best, the same set of data about the number of red cards that soccer referees gave to players of different skin colors. The project’s leaders also set up an elaborate system for the teams to read and rate each other’s analysis plans. The idea was that the 29 analyses would offer more force and certainty than any one experiment could, plus researchers could get feedback on their plans before going through with an entire study.

“We found that the overall group consensus was much more tentative than would be expected from a single-team analysis.”

Ultimately, about two-thirds of the teams found referees were more likely to give red cards to darker-skinned players, at rates varying from “just barely more likely” to “three times as likely.” But it seems probable, based on the general consensus, that referees do indeed discriminate against players, perhaps unconsciously, based on their skin color. There’s a provisional feel to the conclusion, however, that a single study wouldn’t have. “We found that the overall group consensus was much more tentative than would be expected from a single-team analysis,” write Silberzahn and Eric Uhlmann, a business-school professor in Singapore. “The experience convinced us that bringing together many teams of skilled researchers can balance discussions, validate scientific findings, and better inform policymakers.”

One thing that really comes through in Silberzahn and Uhlmann’s essay: how much more work their project entails, compared to a single lab’s effort. We’re talking 28 additional research teams’ worth of time, attention, and salaries. It’s a lot of resources. Silberzahn proposes such a full-court press be reserved for studies that test complex or controversial theories, or that answer questions especially relevant to lawmakers.

While many reports about reproducibility in social science talk about researchers double-checking each other’s work after they publish papers, Silberzahn thinks there are benefits to this pre-publication check he’s set up. “One aspect our project facilitated, was to provide a space where researchers could have a second or third or fourth opinion on their approaches and actually learn something,” he says. “Now, that is different once a paper is out. You’re still eager to learn, but you’re quite invested in your claim, if you’ve been in the media stating that you found this or that effect.”

Silberzahn should know. In 2013, he and Uhlmann published a paper, in the journal Psychological Science, that found Germans with royal-sounding last names, such as König (king) or Kaiser (emperor), were more likely to be business managers than Germans with ordinary-sounding names, like Koch (cook) or Bauer (farmer). The pair soon got an email from business school professor Uri Simonsohn at the University of Pennsylvania, who asked for their data so he could re-analyze it. Seven months later, Simonsohn, Uhlmann, and Silberzahn published a commentary saying that, because of biases in the pair’s original data set, there was no evidence of a relationship between Germans’ last names and their likelihood of achieving a managerial role.

The backtracking had no ill effect on Silberzahn’s and Uhlmann’s careers. So long as there’s no lying or plagiarizing, it’s expected among scientists that your work might sometimes be disproven by new data. Still, things were perhaps a little uncomfortable. “Personally, we didn’t have a problem with saying, ‘Yeah, he’s applied a different approach and that’s probably close to the truth, what he finds,'” Silberzahn says. “On the other hand, some media picked the story up again. It was kind of a prime target, saying, ‘Oh yeah, these researchers, who half a year ago said one thing is true, now say another thing is true.'”

But maybe researchers don’t even have to go through that, if they have a safe space to check if they’re wrong before anything gets published.

Related Posts