What are the key points?

Google releases Gemma 4, a suite of four open-weights, multimodal models ranging from 2B to 31B parameters. Gemma 4 31B improves reasoning by 29 points over its predecessor, rivaling larger, high-performance open models. The new series supports native image, video, and audio input, offering increased efficiency and Apache 2.0 licensing.

Google Debuts Gemma 4 With Advanced Multimodal Capabilities

•Google releases Gemma 4, a suite of four open-weights, multimodal models ranging from 2B to 31B parameters.
•Gemma 4 31B improves reasoning by 29 points over its predecessor, rivaling larger, high-performance open models.
•The new series supports native image, video, and audio input, offering increased efficiency and Apache 2.0 licensing.

Google has officially expanded its open-weights lineup with the release of Gemma 4. This new series introduces four distinct model sizes, ranging from compact edge-optimized versions to a flagship 31-billion parameter model. The release marks a significant milestone for developers seeking powerful, flexible, and accessible tools that run natively across diverse hardware environments.

The standout feature of this generation is its enhanced multimodality. Unlike its predecessor, Gemma 4 is built to handle text, images, and video natively across the entire spectrum of its model sizes. The smaller variants, specifically the E4B and E2B models, even integrate audio processing. This leap allows developers to build sophisticated applications that can process visual and auditory information without needing to chain together multiple specialized models, significantly streamlining the development pipeline.

Performance benchmarks highlight a massive generational shift. The flagship 31B model exhibits a 29-point improvement in reasoning capabilities compared to the previous Gemma 3 iteration. Perhaps more impressively, it achieves these results while maintaining notable token efficiency—using significantly fewer output tokens to complete complex reasoning tasks compared to competitors at similar intelligence tiers. This efficiency is crucial for real-world applications where operational costs and speed are just as vital as raw reasoning power.

The architecture underlying these models also introduces interesting diversity. For instance, the 26B A4B model utilizes a Mixture of Experts (MoE) approach. This technique involves activating only a subset of the model's total parameters for any given request. By keeping the majority of the model 'inactive' during specific inferences, it delivers high-performance results while remaining computationally lighter than dense models of a similar size.

Accessibility is another core pillar of this launch. Released under the Apache 2.0 license, Gemma 4 removes many of the restrictions found in earlier versions. For edge computing enthusiasts, the 2B model is particularly noteworthy. Designed for on-device deployment, it fits within 3GB of RAM in 4-bit quantization. This makes it an ideal choice for background tasks, basic function calling, and local, private AI experiences on mobile or edge hardware, effectively putting sophisticated reasoning capabilities into the palm of a user's hand without relying on remote server connectivity.

Google has officially expanded its open-weights lineup with the release of Gemma 4. This new series introduces four distinct model sizes, ranging from compact edge-optimized versions to a flagship 31-billion parameter model. The release marks a significant milestone for developers seeking powerful, flexible, and accessible tools that run natively across diverse hardware environments.

The standout feature of this generation is its enhanced multimodality. Unlike its predecessor, Gemma 4 is built to handle text, images, and video natively across the entire spectrum of its model sizes. The smaller variants, specifically the E4B and E2B models, even integrate audio processing. This leap allows developers to build sophisticated applications that can process visual and auditory information without needing to chain together multiple specialized models, significantly streamlining the development pipeline.

Performance benchmarks highlight a massive generational shift. The flagship 31B model exhibits a 29-point improvement in reasoning capabilities compared to the previous Gemma 3 iteration. Perhaps more impressively, it achieves these results while maintaining notable token efficiency—using significantly fewer output tokens to complete complex reasoning tasks compared to competitors at similar intelligence tiers. This efficiency is crucial for real-world applications where operational costs and speed are just as vital as raw reasoning power.

The architecture underlying these models also introduces interesting diversity. For instance, the 26B A4B model utilizes a Mixture of Experts (MoE) approach. This technique involves activating only a subset of the model's total parameters for any given request. By keeping the majority of the model 'inactive' during specific inferences, it delivers high-performance results while remaining computationally lighter than dense models of a similar size.

Accessibility is another core pillar of this launch. Released under the Apache 2.0 license, Gemma 4 removes many of the restrictions found in earlier versions. For edge computing enthusiasts, the 2B model is particularly noteworthy. Designed for on-device deployment, it fits within 3GB of RAM in 4-bit quantization. This makes it an ideal choice for background tasks, basic function calling, and local, private AI experiences on mobile or edge hardware, effectively putting sophisticated reasoning capabilities into the palm of a user's hand without relying on remote server connectivity.

Google Debuts Gemma 4 With Advanced Multimodal Capabilities

Tags