Build Multilingual AI Solutions with Cohere’s New Aya Model
Cohere for AI's new open source project lets developers build AI applications that span over 100 languages
At a Glance
- Cohere for AI unveils Aya: A new model and dataset combo for powering multilingual AI workloads.
English is one of the most essential languages used in business. But to serve a global audience more effectively, companies need to be multilingual. Enter Aya, a new AI model that supports 101 different languages. It is from Cohere for AI, the nonprofit research subsidiary of AI startup Cohere.
The Aya model is open source and can be used commercially under its Apache 2.0 license. Aya is designed to also cover languages largely ignored by most advanced models.
Aya could power customer support chatbots or virtual agents. The model could also be used to support content translation or localization of business websites or product marketing.
Cohere claims the model serves double the number of languages covered by existing open source models such as BLOOMZ & mT0. The company also said its natural language understanding, summarization and translation skills outperform rival models.
Credit: Cohere
Cohere said Aya means 'fern' in the Twi language from Ghana and it is a symbol of "endurance and resourcefulness which captures the spirit of our own commitment to accelerate multilingual AI progress." The company pointed out that while only 5% of the world speaks English at home, 63.7% of the internet is in English. A lot of the data used to train AI models comes from the internet.
"Unless we address this disproportionate representation head-on, we risk perpetuating this divide and further widening the gap in language access of new technologies," Cohere said in a blog post.
You can access Aya via Hugging Face. You can also experiment with the model via the Cohere Playground. To join Cohere's efforts, connect to its Discord server for the Aya project.
Massive multilingual dataset
Also made available is the underlying dataset used to train Aya. This dataset spans some 513 million prompts across 114 language and includes annotations from native and fluent speakers.
The dataset contains language examples including variations of dialects that make Aya return responses that are organic and natural.
The dataset can also be downloaded from Hugging Face and can power commercial applications.
Upon unveiling the project, Cohere said Aya and its dataset “can effectively serve a broad global audience that have had limited access to-date.”
Cohere joins other research labs trying to democratize AI to encompass underserved societal groups. Meta, for example, has its No Language Left Behind project to support low-resource language translation. And Google’s Universal Speech Model is powering multilingual capabilities in its product lines.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like