Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!
January 31, 2024
It overtook GPT-4 and is closing in on GPT-4 Turbo, which retains its crown. Both GPT-4 Turbo and GPT-4 have held a vice-like grip on the top two spots respectively for some time. Bard's surge is due to being updated with Google's new Gemini Pro large multimodal model.
The Chatbot Arena Leaderboard was created by LMSYS Org, which stands for Large Model Systems Organization, an open research group founded by the University of California, Berkeley in partnership with the University of California, San Diego and Carnegie Mellon University.
LMSYS, which built the Vicuna LLM, described Bard’s leap up the leaderboard as a “remarkable achievement.”
The Chatbot Arena is a benchmark platform for large language models that features "anonymous, randomized battles in a crowdsourced manner." The rankings are based on the Elo rating system, which is widely used in chess and other competitive games.
The Gemini Pro-powered Bard is only the second model on the board to achieve a score over 1200.
Bard’s rise comes as Google updates the underlying models powering the chatbot. Out is PaLM 2 and in comes Gemini, Google's most powerful model to date. It unveiled Gemini last December, launching the initial Pro version for Bard, and expects to release the mammoth version, Gemini Ultra, soon.
“The race is heating up like never before! Super excited to see what's next for Bard + Gemini Ultra release," according to LMSYS.
The rise up the score board is a welcome reprieve for Google. After a shaky start, Bard received routine updates with integrations now spanning other Google apps such as YouTube and Docs.
Recently, Reddit users or Redditors told Google they wanted Bard to be more like ChatGPT, after a Google product manager asked for their wish list. Users requested dedicated mobile apps, custom instructions and image generation, with some of those suggestions already in the works.
OpenAI’s GPT-4 has routinely topped model leaderboards. It firmly holds first place on Stanford’s HELM Leaderboard, with GPT-4 Turbo in second. PaLM 2, which previously powered Bard, did not do as well, being pipped by Palmyra X V3 from AI startup Writer as the highest-scoring non-OpenAI model on the HELM leaderboard.
Read more about:ChatGPT / Generative AI
Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.
You May Also Like
Generative AI Journeys with CDW UK's Chief TechnologistFeb 28, 2024
Qantm AI CEO on AI Strategy, Governance and Avoiding PitfallsFeb 14, 2024
Deloitte AI Institute Head: 5 Steps to Prepare Enterprises for an AI FutureJan 31, 2024
Athenahealth's Data Science Architect on Benefits of AI in Health CareJan 19, 2024