Google’s Bard Just Beat ChatGPT's GPT-4 in Rankings
Bard's recent update to Gemini Pro propels it past GPT-4 and Claude, marking a significant shift in the chatbot landscape
At a Glance
- Google's Bard surpasses OpenAI's GPT-4 in the LMSYS Chatbot Arena Leaderboard with its new Gemini Pro version.
Google Bard just surpassed GPT-4 to become the second highest-scoring chatbot on the LMSYS Leaderboard, loosening the grip of OpenAI’s top models in the chatbot space.
It overtook GPT-4 and is closing in on GPT-4 Turbo, which retains its crown. Both GPT-4 Turbo and GPT-4 have held a vice-like grip on the top two spots respectively for some time. Bard's surge is due to being updated with Google's new Gemini Pro large multimodal model.
The Chatbot Arena Leaderboard was created by LMSYS Org, which stands for Large Model Systems Organization, an open research group founded by the University of California, Berkeley in partnership with the University of California, San Diego and Carnegie Mellon University.
LMSYS, which built the Vicuna LLM, described Bard’s leap up the leaderboard as a “remarkable achievement.”
The Chatbot Arena is a benchmark platform for large language models that features "anonymous, randomized battles in a crowdsourced manner." The rankings are based on the Elo rating system, which is widely used in chess and other competitive games.
The Gemini Pro-powered Bard is only the second model on the board to achieve a score over 1200.
Bard’s rise comes as Google updates the underlying models powering the chatbot. Out is PaLM 2 and in comes Gemini, Google's most powerful model to date. It unveiled Gemini last December, launching the initial Pro version for Bard, and expects to release the mammoth version, Gemini Ultra, soon.
Beats Claude, too
Bard also beat all versions of Claude, with the Gemini Pro Dev API version ranking higher than Anthropic’s Claude 2.1 and GPT 3.5 Turbo.
“The race is heating up like never before! Super excited to see what's next for Bard + Gemini Ultra release," according to LMSYS.
The rise up the score board is a welcome reprieve for Google. After a shaky start, Bard received routine updates with integrations now spanning other Google apps such as YouTube and Docs.
Recently, Reddit users or Redditors told Google they wanted Bard to be more like ChatGPT, after a Google product manager asked for their wish list. Users requested dedicated mobile apps, custom instructions and image generation, with some of those suggestions already in the works.
OpenAI’s GPT-4 has routinely topped model leaderboards. It firmly holds first place on Stanford’s HELM Leaderboard, with GPT-4 Turbo in second. PaLM 2, which previously powered Bard, did not do as well, being pipped by Palmyra X V3 from AI startup Writer as the highest-scoring non-OpenAI model on the HELM leaderboard.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like