Google's New Gemini 1.5 Can Handle 700,000 Words at a Time

The latest Google AI model can handle hefty multimodal files – but latency issues are expected upon initial launch

February 15, 2024

3 Min Read

At a Glance

Google unveiled Gemini 1.5, which can handle up to a million tokens. In contrast, OpenAI's GPT-4 Turbo only has 128,000.
The first version to drop is Gemini 1.5 Pro, which Google said is as powerful as Gemini 1.0 Ultra but uses less compute.
In a demo, Gemini 1.5 read through a 400 page report to pick out specific moments and quotes accurately.

The AI race is heating up as Google unveiled the latest update to its powerful Gemini multimodal model – it boasts a staggering one million-token context window.

According to a Google blog post, Gemini 1.5 outperforms the initial version of Gemini and can handle up to one million multimodal tokens. That comes to around 700,000 words, one hour of video, 11 hours of audio or 30,000 lines of code at a time, according to Google. For comparison, OpenAI’s GPT 4 Turbo’s context window stands at just 128,000.

Credit: Google

The first version headed to users is Gemini 1.5 Pro, a mid-sized multimodal model. Developers and enterprise customers can try out the hefty context window via AI Studio and Vertex AI in a private preview. Google said Gemini 1.5 Pro is as powerful as Gemini 1.0 Ultra – the most powerful model in the Gemini family.

There is no cost to try the one million context window version. However, Google said users should expect longer latency times as it is still an experimental feature.

Google CEO Sundar Pichai said in the blog post that Gemini 1.5’s beefed-up context window will “enable entirely new capabilities and help developers build much more useful models and applications.”

Google demoed the 1.5 Pro’s ability to derive information from a 402-page PDF transcript of the Apollo 11 moon landings as well as a series of multimodal prompts. The model was able to cite near exact timecodes of when quotes were said and even pick out relevant quotes based on a prompt of a crude drawing.

The unveiling of Gemini 1.5 Pro comes a day after Nvidia overtook Google as the third most valuable company in the U.S., after Microsoft and Apple. At the close of trading on Wednesday, Nvidia was worth $1.81 trillion while Google parent Alphabet came in at $1.78 trillion.

But investors were not impressed. As of midday trading today, Google shares slid 3.3% to $143.88.

Bigger but also better?

Despite its ability to handle huge swathes of input, Gemini 1.5 uses less computing power than the Gemini 1.0 Ultra.

However, Demis Hassabis, CEO of Google DeepMind, said Google is working on optimizations to improve latency before rolling out the full one million token version of the model.

The new model was built using a combination of Transformer and Mixture of Experts (MoE) architecture – meaning it is made up of one large neural network as well as a series of smaller "expert” neural networks.

Hassabis said that the innovative architecture allows Gemini 1.5 to learn complex tasks more quickly and maintain quality while being more efficient in training. The new model “represents a step change in our approach, building upon research and engineering innovations across nearly every part of our foundation model development and infrastructure,” he said.

Omdia Chief Analyst Lian Jye Su said the model’s differentiated approach is a harbinger of what’s to come. “A mixture of Transformer and MoE architecture points towards a future with different AI training and inference approaches, especially when smaller expert models potentially require fewer training resources.”

Alexander Harrowell, Omdia principal analyst in advanced computing for AI, said the new architecture “confirms that Mixture of Experts is now the research frontier across AI, what with GPT-4, Mixtral, and now this. Google has a long-standing research focus on large MoE models, back to at least 2017, and invented the idea of combining the Transformer architecture and the mixture-of-experts principle.”

He pointed out that Google called Gemini 1.5 a mid-sized model but did not disclose the number of parameters for Gemini 1.5 or how many experts are in the MoE. However, “this seems to confirm that again, mid-size is the place to be,” Harrowell added.

Su noted that the timing of Gemini 1.5’s unveiling comes after OpenAI CEO Sam Altman teased the upcoming GPT-5’s improved capabilities “It shows that Google’s R&D to commercialization and productization cycle is catching up to OpenAI’s, though more benchmarks are required to make further comments about its performance.”

About the Author(s)

Ben Wodecki

Jr. Editor

Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor. He has written for The New Statesman, Intellectual Property Magazine, and The Telegraph India, among others. He holds an MSc in Digital Journalism from Middlesex University.

See more from Ben Wodecki

Related Topics

Recent in ML

Related Topics

Recent in NLP

Related Topics

Recent in Data

Related Topics

Recent in Automation

Related Topics

Recent in Verticals

Related Topics

Recent in Responsible AI

Related Topics

Recent in Companies

Related Topics

Google's New Gemini 1.5 Can Handle 700,000 Words at a Time

At a Glance

Bigger but also better?

About the Author(s)

Latest News

Trending articles