Skip to main content

Linguistic Analysis Reveals Research Fraud

An examination of papers by the discredited Diederik Stapel finds linguistic differences between his legitimate and fraudulent studies.
  • Author:
  • Updated:
(Photo: Protasov AN/Shutterstock)

(Photo: Protasov AN/Shutterstock)

Does the name Diederik Stapel ring a bell? He’s the prominent Dutch psychologist who, in 2011, was found to have engaged in research fraud on a massive scale. Much of its data, it now appears, was simply made up.

Could we—should we—have realized his too-good-to-be-true findings were, in fact, fiction? More importantly, can we spot the next guy whose provocative assertions are based on fraudulent data?

It’s a difficult task, to be sure. But David Markowitz and Jeffrey Hancock of Cornell University report that, in Stapel’s case, they were able to classify his research as legitimate or fraudulent “with above-chance accuracy” through careful linguistic analysis.

"Words such as 'profoundly,' 'extremely' and 'considerably' frame the (false) findings as having a substantial and dramatic impact."

Looking at 49 papers he authored—24 fraudulent, 25 legitimate—they found telltale differences in his writing that indicated, at least to an extent, whether the research was real. They conclude that, even in “highly edited” scientific papers, “deception can be revealed.”

Using Wmatrix, a tool that provides linguistic analysis by investigating such variables as word frequency and grammar, Markowitz and Hancock found Stapel’s writing style varied in several ways when he described fake, rather than genuine, data.

Tellingly, “Stapel used nearly 3,000 fewer adjectives in his fake papers than his genuine papers,” they write in the online journal PLoS One. This pattern is consistent with the theory that “descriptive recalls of real experiences are more sensory and contextually driven.”

“Stapel also wrote with more certainty when describing his take data,” the researchers add, “using nearly one-third more certainty terms than he did in the genuine articles. Words such as ‘profoundly,’ ‘extremely’ and ‘considerably’ frame the (false) findings as having a substantial and dramatic impact.”

In other words, when the results were real, he didn’t feel the need to be quite so emphatic about their importance.

The number of experiments and references per paper did not differ significantly between the real and fake studies. However, the fraudulent papers had fewer authors, on average, than the genuine ones—no surprise, as it is “typically easier to deceive in the presence of a smaller group.”

Of course, other deceptive researchers may not leave precisely the same linguistic clues. But this study provides evidence that, even when dealing with the often-dry, heavily edited world of research papers, “language cues are important in deception detection.”

Markowitz and Hancock’s paper, by the way, is available free online. Feel free to count the number of adjectives they use.