Build Multilingual AI Solutions with Cohere’s New Aya Model

Cohere for AI's new open source project lets developers build AI applications that span over 100 languages

February 21, 2024

2 Min Read

Illustration of people of different races around the world

Getty Images

At a Glance

Cohere for AI unveils Aya: A new model and dataset combo for powering multilingual AI workloads.

English is one of the most essential languages used in business. But to serve a global audience more effectively, companies need to be multilingual. Enter Aya, a new AI model that supports 101 different languages. It is from Cohere for AI, the nonprofit research subsidiary of AI startup Cohere.

The Aya model is open source and can be used commercially under its Apache 2.0 license. Aya is designed to also cover languages largely ignored by most advanced models.

Aya could power customer support chatbots or virtual agents. The model could also be used to support content translation or localization of business websites or product marketing.

Cohere claims the model serves double the number of languages covered by existing open source models such as BLOOMZ & mT0. The company also said its natural language understanding, summarization and translation skills outperform rival models.

Credit: Cohere

Cohere said Aya means 'fern' in the Twi language from Ghana and it is a symbol of "endurance and resourcefulness which captures the spirit of our own commitment to accelerate multilingual AI progress." The company pointed out that while only 5% of the world speaks English at home, 63.7% of the internet is in English. A lot of the data used to train AI models comes from the internet.

"Unless we address this disproportionate representation head-on, we risk perpetuating this divide and further widening the gap in language access of new technologies," Cohere said in a blog post.

You can access Aya via Hugging Face. You can also experiment with the model via the Cohere Playground. To join Cohere's efforts, connect to its Discord server for the Aya project.

Massive multilingual dataset

Also made available is the underlying dataset used to train Aya. This dataset spans some 513 million prompts across 114 language and includes annotations from native and fluent speakers.

The dataset contains language examples including variations of dialects that make Aya return responses that are organic and natural.

The dataset can also be downloaded from Hugging Face and can power commercial applications.

Upon unveiling the project, Cohere said Aya and its dataset “can effectively serve a broad global audience that have had limited access to-date.”

Cohere joins other research labs trying to democratize AI to encompass underserved societal groups. Meta, for example, has its No Language Left Behind project to support low-resource language translation. And Google’s Universal Speech Model is powering multilingual capabilities in its product lines.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Build Multilingual AI Solutions with Cohere’s New Aya Model

At a Glance

Massive multilingual dataset

About the Author(s)

Latest News

Trending articles