Can We Really Detect Sarcasm With a Machine?

What was once the domain of literary critics has now become the world of the Secret Service.

Eyes rolled a couple of weeks ago when the Secret Service posted a work order for new social media analytics software. Not because the law enforcement agency tasked with protecting the president and other federal officials lacks a social media presence, but because among the 22 functionality requirements for the software was the “ability to detect sarcasm and false positives.”

Edwin M. Donovan, deputy assistant director of the Secret Service’s Office of Government and Public Affairs, emphasizes that sarcasm detection is only one of many requirements. “It’s so you don’t have to sift through thousands of tweets,” he says. “We want to streamline, to automate the social media monitoring process. Not just for sarcasm.”

“Right now, if there were suddenly 200 tweets about something on D Street near the Capitol, we’d want to be able to synthesize that.”

Donovan says that assessing large volumes of messages has been a problem for the Secret Service, which in the past has borrowed the analytics tools of other agencies, like FEMA. “It’s an issue that’s come up before, and we don’t want to waste time sifting through messages.”

“Right now,” Donovan says, “if there were suddenly 200 tweets about something on D Street near the Capitol, we’d want to be able to synthesize that.” The agency wants to be able to monitor its own presence on social media, but also track trending topics and influential users.

“Remember the purple tunnel of doom?” Donovan asks, referring to the debacle during the 2009 presidential inauguration when thousands of attendances with purple tickets were caught in the Third Street Tunnel, unable to cross 395 to attend the event. “We weren’t monitoring Twitter that day, so we didn’t know. Since then, we’ve entered social media.”

The Secret Service (@SecretService) joined Twitter in February of 2010. The account has posted only 572 tweets, but is followed by 110,000 users. And it’s not the only governmental agency interested in assessing the tone of other users. Last summer, the BBC reported that the French company Spotter provided such an analytics tool to the British Home Office, the European Union Commission, and Dubai Courts.

For around $1,675 per month, Spotter provides software that “uses a combination of linguistics, semantics and heuristics to create algorithms that generate reports about online reputation.” Determining whether users are genuinely complaining or sincerely threatening a government agency is difficult, but Spotter claims an 80 percent accuracy rate and offers the service in 29 languages.

But how does software accomplish a task that is difficult even for sentient beings? Even the most casual of correspondents has had a text message or email misunderstood by its recipient. Without the aid of eye rolls or shoulder shrugs, textual sarcasm can be difficult to detect. The science of sincerity, or sarcasm if you’re a glass-empty type, is critical for law enforcement agencies, but also profitable for corporations seeking to improve their customer service. Understanding whether a reviewer sarcastically praised your product or honestly questioned your services is critical.

So what used to be the domain of literary critics has become a novel problem for software designers. A 2010 paper by three researchers at Hebrew University presented a sarcasm algorithm with a 77 percent precision rate. Developed through an analysis of 66,000 Amazon product reviews, the algorithm was able to detect the situational irony of titles (“[I] Love the Cover” for a book review and “Where Am I?” for a GPS device) and sarcastic patterns in speech and punctuation (sentence length, multiple exclamation or question marks in sentences, and the number of words with all capital letters among others).

But Amazon reviews have many words. In a 2011 paper, three researchers at Rutgers’ School of Communication & Information were much less successful at identifying sarcasm on Twitter, which is limited to 140-character tweets. Tweet length as well as the lack of context made assessing tone very difficult, for machines but also humans. Even with the assistance of Twitter tics like emoticons and hashtags, the human judges and the machines demonstrated only 70 percent accuracy when distinguishing sarcastic tweets from positives or negative tweets.

But that was 2011, and with companies like Spotter already reporting 80 percent accuracy, the Secret Service will likely find more bids than it needs. Donovan expressed some sarcasm of his own when I asked about how they will measure the actual accuracy of whichever software they purchase. “That’s a great question,” Donovan says, “for the companies that make these claims.”

Related Posts