MIT Research: Debating Makes AI Bots Smarter

The 'Society of Minds' approach can reduce AI model hallucinations and improve results

Ben Wodecki, Jr. Editor

September 21, 2023

2 Min Read
MIT logo

At a Glance

  • MIT researchers found 'multiagent debate' between AIs improves accuracy and reasoning over a single model.
  • The debate approach reduces hallucinations and helps models strengthen their responses.

A team of MIT researchers found that having multiple AI systems debate answers to questions leads to improved accuracy in responses compared to just using a single AI system.

In a paper titled Improving Factuality and Reasoning in Language Models through Multiagent Debate, the researchers found that leveraging multiple AI systems processes helps correct factual errors and improve logical reasoning.

The MIT scientists, along with Google DeepMind researcher Igor Mordatch, dubbed the process a "Multiagent Society” and found that it reduced hallucinations in generated output. The approach can even be applied to existing black-box models like OpenAI’s ChatGPT.

Stay updated. Subscribe to the AI Business newsletter.

The process sees various rounds of responses generated and critiqued. The model generates an answer to a given question and then incorporates feedback from other agents to update its own response. The researchers found this process improves the final output as it is akin to the results of a group discussion – with individuals contributing a response to reach a unified conclusion.

The method can also be used to combine different language models – the research pitting ChatGPT against Google Bard. While both models generated incorrect responses to the example prompt, between them, they were able to generate the correct final answer.

Related:MIT Develops 'Masks' to Protect Images From Manipulation by AI

Using the Multiagent Society approach, the MIT team was able to achieve superior results on various benchmarks for natural language processing, mathematics and puzzle solving.

For example, on the popular MMLU benchmark, using multiple agents scored the model an accuracy score of 71, while using only a sole agent scored 64.

“Our process enlists a multitude of AI models, each bringing unique insights to tackle a question. Although their initial responses may seem truncated or may contain errors, these models can sharpen and improve their own answers by scrutinizing the responses offered by their counterparts," Yilun Du, an MIT Ph.D. student and the paper’s lead author.

"As these AI models engage in discourse and deliberation, they're better equipped to recognize and rectify issues, enhance their problem-solving abilities, and better verify the precision of their responses.”

You can access the code used in the multiagent project on GitHub.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like