Meta ImageBind: An AI Model That Mimics Human Perception

Multimodal model takes one data type to generate other data types

Ben Wodecki, Jr. Editor

May 12, 2023

2 Min Read

At a Glance

  • Meta has published ImageBind – an AI model that takes one data type to generate other data types.
  • For example, ImageBind can generate images from audio, or be used to beef up other multimodal models.

Meta has unveiled a new AI model that seeks to mimic human perception by its ability to take one form of data to create other data types. For example, it can take the sound of a bird chirping to generate an image of a bird.

Dubbed ImageBind, the open source model was built to copy how humans absorb information around us - such as walking down a busy street and knowing there are cars around just by hearing car engines.

Using a multi-sensory approach, ImageBind learns from different information sources without the need for explicit supervision. It links together various data sources into a single representation, or ‘embedding space.’

“ImageBind equips machines with a holistic understanding that connects objects in a photo with how they will sound, their 3D shape, how warm or cold they are and how they move,” Meta researchers said.

ImageBind could be combined with a pre-trained DALLE-2 decoder or Make-a-Scene to understand inputs better

ImageBind could be used to improve existing AI models such as Meta’s Make-A-Scene, a multimodal generative AI method capable of generating photorealistic illustrations from text inputs.

Meta proposes using ImageBind to enable Make-A-Scene to create images from audio, such as creating an image based on the sounds of a rainforest or a bustling market.

Meta’s researchers claim the new AI model could be used to moderate content or boost creative design. Researchers could potentially use other modalities as input queries in ImageBind to get outputs in other formats.

Meta in a multimodal mood

The launch of ImageBind continues Meta’s efforts of trying to create multimodal AI systems. The idea is to eventually use these concepts as part of its company-wide focus of creating metaverse experiences.

“ImageBind opens the floodgates for researchers to try to develop new, holistic systems, such as combining 3D and IMU sensors to design or experience immersive, virtual worlds,” Meta contends.

The code for ImageBind is accessible via GitHub on a non-commercial license, as the model is currently only available for research purposes.

ImageBind follows Toolformer, a Meta model published recently that uses a variety of external software tools for natural language processing use cases. And Meta’s open source AI language model LlaMA has formed the basis for a wide variety of popular AI models, including Dolly and Alpaca.

But despite publishing an array of groundbreaking AI models, Meta was a notable absentee from a high-level meeting between Vice President Kamala Harris and other AI leaders such as ChatGPT-maker OpenAI, Google, Microsoft and rising star Anthropic. The U.S. said these companies have agreed to let the public vet its AI models at a well-known hacker convention, Defcon 31.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like