What are the key points?

Google DeepMind launches Gemma 4 family featuring four model sizes under Apache 2.0 license New 31B dense model ranks #3 globally among open models on Arena AI leaderboard Family introduces native support for agentic workflows, long context, and multimodal vision and audio

Google DeepMind Releases Gemma 4 Open Models

•Google DeepMind launches Gemma 4 family featuring four model sizes under Apache 2.0 license
•New 31B dense model ranks #3 globally among open models on Arena AI leaderboard
•Family introduces native support for agentic workflows, long context, and multimodal vision and audio

Gemma 4 represents a significant leap for open AI, prioritizing "intelligence-per-parameter" to deliver frontier-level reasoning on standard consumer hardware. By utilizing the same core architecture found in Gemini 3, these models allow developers to build sophisticated applications without the massive infrastructure costs typically associated with high-performance systems.

The lineup is split between "Effective" edge models (2B and 4B parameters) and larger high-capacity models (26B and 31B). The smaller variants are specifically optimized for mobile and IoT devices, enabling real-time audio and vision processing to happen locally on the device. Meanwhile, the 26B model utilizes a Mixture of Experts (MoE) approach—only activating a fraction of its total parameters during any single task—to ensure high speed without sacrificing the quality of its logical reasoning.

Crucially, Gemma 4 is designed for agentic workflows, meaning the models can independently use external tools and produce structured data (JSON) to complete multi-step plans. This shift from simple chat interfaces to autonomous agents is bolstered by massive context windows, ranging from 128K to 256K tokens, which allow the models to process entire code repositories or complex documents in one go.

Gemma 4 represents a significant leap for open AI, prioritizing "intelligence-per-parameter" to deliver frontier-level reasoning on standard consumer hardware. By utilizing the same core architecture found in Gemini 3, these models allow developers to build sophisticated applications without the massive infrastructure costs typically associated with high-performance systems.

The lineup is split between "Effective" edge models (2B and 4B parameters) and larger high-capacity models (26B and 31B). The smaller variants are specifically optimized for mobile and IoT devices, enabling real-time audio and vision processing to happen locally on the device. Meanwhile, the 26B model utilizes a Mixture of Experts (MoE) approach—only activating a fraction of its total parameters during any single task—to ensure high speed without sacrificing the quality of its logical reasoning.

Crucially, Gemma 4 is designed for agentic workflows, meaning the models can independently use external tools and produce structured data (JSON) to complete multi-step plans. This shift from simple chat interfaces to autonomous agents is bolstered by massive context windows, ranging from 128K to 256K tokens, which allow the models to process entire code repositories or complex documents in one go.

Google DeepMind Releases Gemma 4 Open Models

Tags