Amazon ML Expert: What Makes Models Truly Open Source

Just because it is free does not make it open source, says Julia Ferraioli at the State of Open Con event in London

Ben Wodecki, Jr. Editor

February 6, 2024

2 Min Read
Photo of Julia Ferraioli

At a Glance

  • Amazon machine learning strategist Julia Ferraioli defines what constitutes true open source, at the State of Open Con event.

The likes of Meta and Mistral have been touting their open source AI models as viable alternatives to proprietary systems from OpenAI.

But what makes an open source machine learning system truly open source? Consider Meta’s Llama 2 – while the model’s weights and evaluation code were made available, the company did not disclose its training data.

Amazon machine learning strategist Julia Ferraioli said that just because a system is free, it does not make it open.

Speaking at the State of Open Con event in London, Ferraioli cautioned that being able to view model checkpoints or weights does not explicitly define what it means to be an open source machine learning system.

“For [a machine learning system] to be open. I need to be able to question it,” Ferraioli said. She proposed her litmus test to determine whether a system is truly open source: Whether a user can access the model, underlying data, code and metadata.

“As models are essentially just very large matrices. I need all of that other information to be open and disclosed,” Ferraioli said. “If yes, I can verify it. I can reproduce it. I can change it. And what's more, I can vehemently disagree with it, which is an important aspect of open source.”

The open source AI field is constantly evolving, with new systems emerging frequently amid the generative AI wave.

Related:Open Source vs. Closed Models: The True Cost of Running AI

Ferraioli described machine learning as the foundation of the emergence of generative AI systems. But for scientists to be able to trust these generative systems, Ferraioli said experts need to know how a system was trained, what it was trained on and what tasks for which it is appropriate.

Companies and community groups looking to open source their systems need to disclose a lot of information to make them truly open source.

The Amazon strategist said that while some may question whether open source machine learning needs all the underlying aspects, it is important to provide access to unlock true open source.

“Just because something is hard, does not mean you should not try,” she added. “By breaking things down into their component parts, and not boiling things down into an overly reductionist model, we can create a specification of open source machine learning that focuses on what is important.”

Read more about:

Conference News

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like