Microsoft Makes Miniscule Model Multimodal Like ChatGPT's GPT-4V

Microsoft researchers working on smaller language models to make AI less compute-intensive

November 3, 2023

3 Min Read

Photo of a miniature robot the size of a thimble, situated on a computer keyboard. Microsoft is building its own tiny AI models.

Microsoft Phi 1.5 is just 1.3 billion parameters in sizeAI Business using DALL-E 3

At a Glance

Microsoft gave its diminutive 1 billion parameter Phi 1.5 model the ability to understand images.

There’s no definitive answer to the size of GPT-4, OpenAI’s flagship large language model. Some believe it’s trillions of parameters in size. GPT-3, the prior iteration, has 175 billion parameters. But researchers at Microsoft have built a miniscule model of just 1 billion parameters – and it's multimodal.

Microsoft’s Phi 1.5 model was first revealed back in September. The open-source model is designed so users can deploy a large language model using less power, saving them money.

Now Phi 1.5 can interact with images – with the model able to understand images in inputs.

Microsoft researchers told Semafor that the update to Phi 1.5 only added a slight increase to the minuscule model's size but that it could help ease access to AI.

Sebastien Bubeck, a senior principal research manager at Microsoft Research said the multimodal update to ChatGPT was “one of the big updates that OpenAI made” to its flagship chatbot.

ChatGPT now boasts the ability to interact with images and voice thanks to GPT-4V, a new underlying model added to the chatbot’s architecture that enables it to process multimodal inputs, like an image with text written on it.

Bubeck told Semafor: “When we saw that, there was the question: Is this a capability of only the most humongous models or could we do something like that with our tiny Phi 1.5? And, to our amazement, yes, we can do it.”

OpenAI has since made a further upgrade to ChatGPT – enabling it to interact with PDFs and other documents users upload.

Out with the big, in with the small

There has been growing research work in the field of smaller AI models. With AI GPUs like Nvidia’s H100s becoming scarce as companies all over the world snap them up, companies and academics want smaller models that they can run on existing hardware.

Early models have already emerged, like Pythia-1b from EleutherAI and MPT-1b from the Databricks-owned MosaicML. Another such model currently in training is TinyLlama, a 1.1 billion Llama model that can run on a consumer-grade PC chip.

Microsoft researchers told Semafor that smaller models won’t replace the bigger foundation models like GPT-4, but represent cost-effective alternatives for smaller, specific and potentially even edge-applicable tasks.

Ece Kamar, a senior researcher in the adaptive systems and interaction group at Microsoft Research, said: “We are thinking about how do we build these systems responsibly so they work well in the real world. All of the work we are doing on small models is giving us interesting puzzle pieces to be able to build that ecosystem.”

Ahmed Awadallah, senior principal researcher at Microsoft Research, told the website that smaller models could even work in tandem with larger models: “You could also imagine the small model being deployed in a different regime. And then maybe, when it doesn’t have enough confidence in acting, it can go back to the big model.”

Microsoft’s researchers have already been working on a similar idea - AutoGen, an open-source library for enabling large language models to work collaboratively, with multiple AI systems used to generate an output, instead of just one, a concept called a multi-agent approach.

Recent research from MIT and Google DeepMind proposed such a system, dubbing the concept a “Multiagent Society” finding it can reduce AI model hallucinations and improve results.

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Microsoft Makes Miniscule Model Multimodal Like ChatGPT's GPT-4V

At a Glance

Out with the big, in with the small

About the Author(s)

Latest News

Trending articles