Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!
January 25, 2024
A new benchmark test will let businesses inspect the reliability of commercial multimodal AI models when they are fed imperfect, noisy data.
MMCBench was created by researchers from Sea AI Lab, the University of Illinois Urbana-Champaign, TikTok parent ByteDance and the University of Chicago. It introduces errors and noise into text, image and speech inputs, then measures how consistently over 100 popular models like Stable Diffusion can generate outputs.
The new benchmark spans text-to-image, image-to-text, and speech-to-text, among others. It would allow users to determine if multimodal models are more trustworthy and robust when data gets corrupted – which could help businesses avoid costly AI failures or inconsistencies when real-world data does not match training data.
MMCBench involves a two-step process – first, selection, which determines the similarity. Non-text inputs like model-generated captions or transcriptions are compared with text inputs before and after corruption. Then, the evaluation process measures self-consistency by comparing clean inputs with outputs from corrupted inputs.
The resulting process provides users with an effective tool to evaluate multimodal models. An overview of the MMCBench process can be seen below.
Multimodal models are becoming increasingly prevalent in the AI space, however, there are limited tools for developers to evaluate these emerging systems.
“A thorough evaluation under common corruptions is critical for practical deployment and facilitates a better understanding of the reliability of cutting-edge large multimodal models,” a paper outlining the benchmark reads.
The benchmark does have some limitations, however. For example, the use of greedy decoding during evaluation, where the token (word) with the highest probability as the next token in the output sequence is chosen, could underestimate the true capabilities of some models. Also, high output similarity could mask poor-quality results.
The team behind the benchmark, however, plans to update new models and add more modalities, like video, into MMCBench, so it should improve over time.
Read more about:ChatGPT / Generative AI
Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.
You May Also Like
Generative AI Journeys with CDW UK's Chief TechnologistFeb 28, 2024
Qantm AI CEO on AI Strategy, Governance and Avoiding PitfallsFeb 14, 2024
Deloitte AI Institute Head: 5 Steps to Prepare Enterprises for an AI FutureJan 31, 2024
Athenahealth's Data Science Architect on Benefits of AI in Health CareJan 19, 2024