IBM’s latest Grand Challenge: An expert computer debater

AI Business interviews Noam Slonim, the mind behind Project Debater.

Ben Wodecki, Jr. Editor

May 11, 2022

8 Min Read

AI Business interviews Noam Slonim, the mind behind Project Debater.

After IBM’s Deep Blue and Watson victories against human game opponents, the computer giant has been working on its next Grand Challenge: a computer that can debate expert human debaters.

Thus, Project Debater was born. The brainchild of Noam Slonim at IBM’s lab in Israel, Project Debater has been in development since 2012. It had its first live debate in 2019 in San Francisco.

Project Debater squared off against Harish Natarajan, who holds the world record for the most debate victories. The topic was, “we should subsidize preschool.” While Natarajan won the debate, according to a poll of the audience, it was a pivotal moment in computing since it was the first AI system to debate humans on complex topics.

AI Business recently caught up with Slonim, principal investigator of Project Debater, to find out what’s next.

Ben Wodecki: Noam, talk to us a little bit about Project Debater – the system has come a long way since your initial proposal back in 2012. Where is it today?

Noam Slonim: (Project Debater) started as a single-slide proposal. To be honest, it was a little bit far-fetched, and it continued to feel far-fetched over the years. We gradually became more convinced that we could actually make a meaningful demonstration along the lines of the original vision.

It was a non-standard journey because usually, you don't get the privilege of working on a single project with a relatively large team for such a long time. Three years on (from the live demonstration in San Francisco), we've continued to work but we're taking other directions as opposed to trying to further improve the live debate system.

Figure 1: 6191.jpg Noam Slonim

Wodecki: How do you plan on expanding on Project Debater? Are there upgrade plans? How does the project adapt over time?

Slonim: The fact that we were given so many years [to work on the project] is in the tradition of IBM research in executing grand challenges in AI. We had Deep Blue vs. Garry Kasparov and Watson in Jeopardy. It was a week after Watson on Jeopardy when they asked all researchers, what should be the next one and we thought to try it. (See story on Deep Blue vs. Kasparov.)

We didn't know much about competitive debates at the time. I'm not a debater and I never really practiced that professionally. But I was interested in what natural language processing and generation can do − I like to argue and I think there was a good connection there.

It was one out of around 100 proposals and we questioned whether we'd get picked. Each time it was whittled down we asked, ‘are they really going to support something so crazy for so many years?' We were really privileged in this regard that IBM was willing to make such an investment in such an exploratory research direction. It is a rare opportunity to have such a strong team of researchers to focus on something for so many years.

We had a clear goal: we knew from the first day what it is that we wanted to demonstrate. Think of it as a lighthouse in the dark: You're trying to navigate an ocean, and there's a lighthouse showing you the clear path.

Wodecki: What kinds of commercial and business aspects to this are you exploring? Can the underlying system be applied to something like smart assistants for example?

Slonim: Around a year ago, we published a paper that was featured on the cover of Nature magazine, and it was the first paper to describe the system in its entirety. This was two years after the demo. There is a big difference between making a demonstration versus sharing the results with the scientific community.

We published around 60 papers, but usually, each was highlighting a specific aspect of the system. In that Nature paper, we provided the empirical evaluation of the system around 80 different debate topics. It was important for us to share the average performance of the system, not the performance in one or two debates.

We also took the path of sharing the technology with the scientific community. But we didn't think it made sense to share it as a monolith live debate system. Instead, we broke it down to more modular, tangible services that reflected the various skills that the system had and now we have the Debater Early Access program, and many academic teams approached us and we provided access for free to use these skills to do various research projects. We also used it on the commercial side with various IBM clients to explore the value of the technology.

Debater was about finding arguments from collections of hundreds of millions of articles and then composing a narrative out of that. But we always toyed with the idea of proposing a controversial topic to a group of individuals and asking them to contribute their arguments − can we use the same technology to create a compelling narrative? And this involves a lot of technical challenges, like automatically detecting the stance of a contributed argument, identifying paraphrases, understanding underlying themes and also automatically assessing argument quality.

Figure 3: box-highlight.gif

We refer to this use case as 'Speech by Crowd.' We demoed this on several occasions. Beyond this use case, we are doing a sort of survey − posing a topic to a group of people and asking them to express opinions. And surveys are important, we run into them all the time. Speech by Crowd could be a communication channel between people making decisions and people who might be impacted by the decisions, like a client who would like feedback on a product or an employer that would like to get feedback from employees or even a government searching for feedback from citizens on a new policy being examined.

A survey to some extent is a debate; people are addressing their opinion about questions they're being asked. This was the first step. Since then we took this other step and have a technology that is going to be very useful in this landscape, which we call 'key point analysis.'

The idea is to look at survey responses, automatically identify key points and then understand the prevalence of each key point. We can provide a decision-maker with a key points summary. We've started by applying this internally at IBM.

We use key point analysis to analyze around 500,000 comments on IBM's employee survey in 2020 and 2021 which gave very interesting results, far beyond what you can get with conventional technology to the extent that the partners that we have at IBM are integrating the technology to the solution they build. It will become a routine technology within IBM, and we are also in the process of commercializing that beyond internal use.

(As for potential deployments in smart speakers,) there's some variation on that. It's worth acknowledging that debate has rules and structure that we were leaning on in order to make progress. While it's a challenge to participate in a debate, it's still a sterile environment with rules. This helped us to make progress. But when you go to the real world, things aren't as clear.

Figure 2: 7240.jpg

We're curious to think about applications along these lines. And one of the areas that we're looking at is a chatbot or dialogue system that can encourage people to make a decision for social good, that users are hesitant about. A natural example that is relevant today is encouraging people to take the COVID vaccine.

It's a very complex issue but we started to work with colleagues at John Hopkins University about such a dialogue system. We're still at the beginning of this idea, and we're just scratching the surface in terms of what can be done. Imagine being able to encourage people to use public transport, or to recycle, or vote. There are so many cases where it could be very useful. It's a natural continuation of what we did with Debater, but still, it's very different.

The important thing to remember is this application isn't a debate. If all you're doing is proving them wrong, you'll probably get the opposite of what you're trying to achieve. You need to show empathy and understanding of people and why they're not taking the decision that you think they should take.

Wodecki: Who’s your dream match-up for the system? Presidential debate with Trump? Churchill in his prime? Who comes to mind?

Slonim: We had these discussions before the debate, and at some point, I was amused by the idea of having Garry Kasparov debate the system. This could have made for some very interesting closure to the epic chess competition (with Deep Blue). A rematch in a completely different field.

He's a fantastic speaker and a very interesting person, so I was toying with this thought, but ultimately, the decision was to go with a well-known expert debater. This idea stayed in a drawer – along with my dream to have (actress) Scarlett Johansson provide the voice for the system.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like