The lab that built Stable Diffusion’s dataset said BUD-E is an open source AI voice assistant that understands context

Ben Wodecki, Jr. Editor

February 13, 2024

2 Min Read
Image of audio waves
Getty Images

At a Glance

  • BUD-E is the voice assistant that responds to user queries in a natural manner in real-time.

AI voice assistants have come a long way since Siri was introduced in February 2010. Now, the team that helped design Stable Diffusion wants to build a next-gen voice assistant that responds to requests in real-time with a natural voice.

German nonprofit research lab LAION unveiled BUD-E, which stands for Buddy for Understanding and Digital Empathy. It is designed to provide more immersive conversational experiences than current AI voice assistants.

LAION claims that current voice assistants respond in what it describes as a “stilted, mechanical nature." Also, "unlike human conversation partners, they often struggle with fully understanding and adapting to the nuanced, emotional, and contextually rich nature of human dialogue, leading to noticeable latencies and a disjointed conversational flow. Consequently, users often experience unsatisfactory exchanges."

BUD-E sounds more natural than current systems and it also runs on consumer devices, the research lab said. Moreover, the system achieved latencies of 300 to 500 milliseconds, a fast response to user requests.

LAION, which built the underlying dataset for the text-to-image AI model Stable Diffusion, created BUD-E with the ELLIS Institute Tübingen, Collabora and the Tübingen AI Center.

It is still early days for BUD-E, with LAION dreaming of a voice assistant that can manage multi-speaker conversations with interruptions, affirmations and thinking pauses.

The current version of BUD-E uses Nvidia’s speech-to-text model FastConformer Streaming STT, Microsoft’s Phi-2 language model and the StyleTTS2 text-to-speech model.

However, LAION wants to scale the underlying models powering BUD-E, expressing confidence it could, in the future, produce responses with low latency using a larger model like the 30 billion parameter version of Meta’s Llama 2.

You can try BUD-E for yourself as all of its code is open source and available on GitHub.

But you can also go one step further and contribute to the development of BUD-E. LAION has invited open source developers and researchers to help refine the voice assistant. For those interested, join the LAION Discord community or reach out at [email protected].

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like