That was the question a group of researchers—including The New Yorker’s Cartoon Editor, Bob Mankoff—asked in a study highlighted by Arxiv and MIT’s Technology Review. And really, asking why robots can’t tell jokes is a roundabout way of asking why humans find anything funny.


What is funniness? Can it possibly be quantitative? And why is it so difficult to define—for, say, a mechanical reproduction of it? These are questions that have plagued scientists for decades. And it’s the central question that brought together an incredibly diverse group of authors on a new paper looking for an answer, including Bob Mankoff, University of Michigan computer scientist Dragomir Radev and scientists from Yahoo! and Columbia.

Together, they set out to do something that sounds very simple: Analyze what New Yorker captions computers think are the “funniest,” the compare that to how that analysis compared to Amazon Turk humans and the actual winning captions selected by New Yorker editors. Their study, Humor in Collective Discourse: Unsupervised Funniness Detection in the New Yorker Cartoon Caption Contest, explains just how complicated dumb jokes can be.


Here’s how it worked: They began by choosing 50 of the New Yorker’s “unpublished” cartoon contests. They included 5,000 different responses to each cartoon, and then analyzed each response to determine the kinds of syntactical elements are expressed in each. Take this contest for example:

They created a “lexical network” for the responses, linking the top captions by structure, syntactical meaning, and theme. Here’s what that looked like for the cartoon above:

So what is funny? Or, “at least, to the readers of the New Yorker,” the authors note, crucially?


The computational analysis turned up a few hints, but no solid answers. For one thing, the highest-ranked jokes are mostly negative-focused. They’re also usually referring to a person or being in the cartoon. And then there’s something called “lexical centrality,” which you’ll see above in the links that have the most connections—or something many share in common.

“More interestingly, we also showed that captions that reflect the collective wisdom of the contest participants outperformed semantic outliers,” they write. Using this network, they could find which elements the jokes had in common, and reach a kind of ranking.


But as MIT points out, what’s interesting is that they don’t really have a conclusion—which illustrates why humor is such a difficult thing to study and replicate in machines. In short, they’re saying that evaluating New Yorker captions with computational analysis led to a few extremely broad findings about which captions humans find “funny:” Jokes that have a negative sentiment. Jokes that involve people. Jokes that involve a relatable theme that many people will “get.”

Sure! But those are still extraordinarily broad insights—not to mention the fact that they only apply to New Yorker captions, an extremely rarefied form of humor (that some would argue doesn’t qualify at all) that certainly isn’t universal to all humans when it comes to lols.

They do conclude that they’re going to keep studying the captions—and make them available for any other researchers—and that, the next time around, they’re going to dig into puns: “e.g., ‘Tell my wife I’ll be home in a minotaur.’”


[Arxiv; h/t MIT’s Technology Review; Image: AP Photo/Lilli Strauss]

Contact the author at