Google Expands Gemini Lineup With Large, Small Model Updates at Google I/O 2024
Google also unveiled a new open source vision-language model for generating image captions and image labels, as well as its latest small language model
Google has unveiled a host of powerful new updates to its flagship Gemini model, ranging from new small to large versions.
At last year’s I/O event, Google unveiled Gemini, a foundation model designed to power its applications across its entire range of services. This year, Google announced the current flagship version, Gemini 1.5 Pro, is being made available to all developers globally.
Previously reserved to a select group of developers, businesses can now use the model, which is accessible from Google’s Gemini Advanced platform.
Google CEO Sundar Pichai said Gemini 1.5 Pro received several improvements across translation coding and reasoning based on feedback from its initial rollout.
The model can be used across 35 languages and is multimodal and can comprehend images, text and visuals in prompts.
The current version has a context window of up to 1 million tokens. A context window represents how much text a model can handle in an input, meaning Gemini 1.5 Pro, can handle the equivalent of around 1,500 pages of text.
At I/O, however, Google announced it would be raising Gemini’s context window even further.
Gemini 1.5 Pro’s already mammoth context window will be increased to up to 2 million tokens, or around 1.5 million words. In contrast, OpenAI's GPT-4 Turbo can only handle 128,000 tokens.
Google presented interviews with developers who admitted to feeding 1.5 Pro multiple lengthy research papers but Gemini was still able to handle the hefty input with ease.
The 2 million token version of Gemini 1.5 is only available to select developers in private preview.
“It’s amazing to look back and see just how much progress we have made in a few months,” said Pichai. “This represents the next step on our journey towards the ultimate goal of infinite context.
“[Multimodalitiy and long context] is powerful on its own, but together they unlock deeper capabilities and more intelligence.”
New Smaller Gemini
Google also announced a smaller, more lightweight version of its flagship model, Gemini 1.5 flash, designed to work in low-latency environments.
Sir Demis Hassabis, Google DeepMind CEO, made his first I/O appearance to unveil the small model.
He unveiled a model optimized for high-frequency tasks that require an AI system to have fast response times, like in IoT devices and industrial robotics.
Despite being smaller than 1.5 Pro, Flash still boasts Gemini’s hefty context window.
“Flash is designed to be fast and cost-efficient to serve as scale while still featuring multimodal reasoning capabilities and breakthrough long context,” said Hassabis.
Gemini 1.5 Flash is available in Google’s AI Studio and Vertex.
Both Gemini 1.5 Flash and Gemini 1.5 Pro will be available in June.
Other Model Announcements
Google also unveiled PaliGemma, an open source vision-language model for generating image captions and image labels.
The lightweight model can handle both images and text as inputs, returning responses about images with detail.
Also unveiled at I/O was Gemma 2, Google’s latest small language model.
Launching in June, the model is designed to more efficient for developers and businesses with limited infrastructure access. It’s able to run on a single TPU, Google’s custom hardware, through Vertex AI.
Gemma 2 is larger than Google’s previous Gemma models, standing at 7 billion parameters compared to 2 billion parameters. The new version, however, outperforms models more than twice its size.
Read more about:
ChatGPT / Generative AIAbout the Author
You May Also Like