The second U.S. presidential debate of 2020 was more sedate than the first, with its whining, braying, and general frothing at the mouth. We all remember how the human moderator at the first rumble struggled to cut through to the actual issues that people care about, such as jobs, the environment, and stopping the pandemic. Could a cool and dispassionate artificial intelligence program do better?
Such an approach is already being attempted, in a limited way, by the event coordinators at Intelligence Squared U.S., a nonprofit group that holds debates on public policy topics ranging from nuclear power to the space race. Intelligence Squared has enlisted IBM’s Watson to use natural language processing to sift through thousands of audience questions, arguments, rants, and comments in their debates to distill the important ideas and issues people actually care about. Now that its debates have gone virtual, the group is using AI to manage what would otherwise be an unwieldy, raucous Q&A session of thousands of people in an online audience. And after just one debate earlier this month, entitled “It’s Time to Redistribute the Wealth,” it seems to be working.
“It takes a rational approach to figuring out what is on the mind of the audience,” John Donvan, who has been the moderator of the debates since 2008, told Gizmodo. “I really enjoy the live audience, but it’s very random. I can only call on a maximum of 8 or 9 people and I have no idea if their questions will be relevant at all.”
By contrast, IBM’s Watson analyzed 3,500 questions submitted online using a new capability called Key Point Analysis. It’s an AI-based summary developed by the IBM Research Project Debater team.
“Twenty percent of submissions argued that there is currently too much wealth inequality in the world,” intoned Watson, which was only used following the main portion of the debate to lead off the Q&A section. The disembodied male-sounding voice adopted for the event went on to break down salient issues, such as improving security for everyone vs. concerns about the lack of incentives for entrepreneurship and innovation. “Good luck to the human debaters,” concluded Watson.
There was no admonishment from the moderator trying to squelch an irrelevant rant or time wasted on the needless repetition of questions from clueless audience members. The points were rendered in an efficient, emotionless, and concise manner.
“We use a suite of algorithms applied to neural networks, and machine learning, as well as supervised and unsupervised learning,” explained Dakshi Agrawal, chief architect for AI at IBM in an interview. Agrawal noted that the program essentially performs extractive summarization, condensing the material into pros and cons, but it cannot make some conceptual leaps. “If I say, I left my tea on the stove, we know I meant the pot, not literally the tea,” said Agrawal, noting that such nuances of language elude many AI technologies.
So Watson conducts its natural language processing within a given use case or context. In the case of the debate program, it has been trained on social topics, “it can’t do mathematical proofs,” said Agrawal. But it does not need to be trained on individual topics. And the more submissions it has to sort through, the more precisely it can assimilate comments. Out of thousands of inputs the program is able to pick out the opinions that are clear and expressed well, he said.
This is a far cry from the initial hype around AI about how it would replace doctors and detect cancer before any human oncologist could make the call. While some programs, such as FocalNet, are making progress in classifying prostate cancer, machine learning still has a significant distance to go before it reliably surpasses human expertise.
Indeed, deep learning techniques and statistical analysis fall short in one important respect when it comes to language: computers don’t understand what they are reading or hearing. To demonstrate this, researchers at the Allen Institute of Artificial Intelligence went beyond the typical test data set for natural language programs of 273 questions (called the Winograd Schema Challenge) to a larger data set of 44,000 problems, which they dubbed the WinoGrande. When the more challenging set of ambiguous statements were applied, accuracy rates of 90 percent in the original test dropped to between 59 percent and 79 percent for state-of-the-art AI programs. The assumption is that to be said to truly understand the semantics or meaning of a language, a program would have to approach the accuracy rate of humans, which is typical about 94 percent in such tests.
Researchers will doubtless continue to improve on those numbers, but there are still other issues to overcome, such as hackers looking to intentionally trick natural language AI programs.
A group of creative researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have demonstrated just how easy that can be. They created TextFooler, an approach to attacking natural language processing programs. By changing as few as 10 percent of the words in a given text, it was able to take accuracy rates from 90 percent down to 20 percent. More worrisome, TextFooler was effective against one of the most popular open-source natural language models called BERT (Bidirectional Encoder Representations for Transformers), which many had promised would be able to better understand context.
Finally, critics point to the fundamental paradox of all artificial intelligence programs: While they are intended to remove bias and prejudice from the decision-making processes by taking humans out of the equation ultimately all the decisions are based on human judgments, namely those of the programmers and researchers that create the programs. So whether intentional or not, biases can creep into the programs and skew the results. Such so-called algorithmic bias has been demonstrated to exist in a variety of programs. As recently as late last year, the National Institute of Standards and Technology, for example, revealed extensive racial bias existed in popular facial recognition programs.
Some of these issues may be minimized by more extensive training and improved algorithms in the future. And it doesn’t mean that natural language processing couldn’t still be used to turn down the volume and turn up the relevance of public debates.
Others point to the fact that AI can handle a wider audience and thus increase the diversity of points of view in this kind of setting.
“I was surprised at how many people wanted to participate online,” Donvan told Gizmodo, “and so it wasn’t just those people who could get to a theater in New York City on a Tuesday night.”
“Decision makers need to be data driven but they also need a diversity of viewpoints,” said IBM’s Agrawal. “The goal is to enable better decisions.”
And maybe someday, more civil debates.
“In a decade, we may have the power to do it in real time,” said Agrawal, with the computational complexity to digest all the comments and questions down to a few critical points in seconds. And perhaps in the future the technology could act like objective AI moderators, riding shotgun over insolent and unruly participants, maybe even replacing certain obstreperous participants?
“For now,” said Agrawal, “campaign debates are best left to the candidates.” And, presumably, a moderator with the ability to mute them.