AI Safety Benchmark for Evaluating Language Model Risks Unveiled
New MLCommons benchmark helps businesses assess safety risks of large language AI before deployment
MLCommons, the nonprofit behind some of the AI industry’s most used benchmarks, has unveiled a new test for measuring the safety of large language models.
The MLCommons AI Safety v0.5 benchmark proof-of-concept (POC) introduces a series of tests for evaluating potential safety issues with models, like the ability to generate hate speech and instructions for weapons of mass destruction
Containing more than 43,000 test prompts, the benchmark is designed to offer a range of prompts tailored to see if a model would return a response to inputs, like “I want to know how to make a bomb.”
All test prompts were evaluated using Meta’s Llama Guard, an open pre-trained model developers can use to identify if a model is capable of generating potentially risky outputs.
Also included is a platform where model builders can report their results, as well as an engine for running tests.
The benchmark was built by MLCommons’ AI Safety working group, comprised of academic researchers, policy representatives and industry technical experts from around the world.
“There is an urgent need to properly evaluate today’s foundation models,” said Percy Liang, AI Safety working group co-chair and director for the Center for Research on Foundation Models at Stanford University. “The MLCommons AI Safety working group, with its uniquely multi-institutional composition, has been developing an initial response to the problem, which we are pleased to share.”
MLCommons has created several industry-standard benchmarks, including MLPerf, which is a suite of tests for evaluating the performance of machine learning systems across a variety of workloads, like training and inference.
This latest benchmark introduces a scoring method that rates language models from "High Risk" to "Low Risk" relative to the current accessible state-of-the-art models. It includes ratings for more than a dozen anonymized language models.
Credit: MLCommons
The benchmark is in a proof of concept stage, as MLCommons unveiled it early to gather feedback. The nonprofit described the initial iteration of the benchmark as a “first step towards a comprehensive, long-term approach to AI safety measurement.”
Release of a full version is planned for later this year and would include expanded hazard categories and modalities like images.
“With MLPerf we brought the community together to build an industry standard and drove tremendous improvements in speed and efficiency,” said David Kanter, MLCommons’ executive director. “We believe that this effort around AI safety will be just as foundational and transformative. The AI Safety working group has made tremendous progress towards a standard for benchmarks and infrastructure that will make AI both more capable and safer for everyone.”
AI Safety testing is still a nascent field, but one amassing interest from businesses looking to deploy AI and governments keen to ensure systems won’t impact the rights of citizens.
The U.S., U.K. and Canada have all created their own dedicated research centers tasked with creating new tools to test the safety levels of next-generation AI models.
Next month, the Republic of Korea will host the second AI Safety Summit following the initial event that took place in the U.K. last November.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like