What are the key points?

Google DeepMind releases Gemma 4, a new family of open-weights multimodal models for local device use. Models range from 2B to 31B parameters, supporting native text, image, and audio input processing. Architecture introduces Shared KV Cache and Per-Layer Embeddings to enhance efficiency and long-context performance.

Google DeepMind Launches On-Device Gemma 4 Models

•Google DeepMind releases Gemma 4, a new family of open-weights multimodal models for local device use.
•Models range from 2B to 31B parameters, supporting native text, image, and audio input processing.
•Architecture introduces Shared KV Cache and Per-Layer Embeddings to enhance efficiency and long-context performance.

Google DeepMind has just expanded its open-weights lineup with the release of Gemma 4, a suite of models specifically engineered for on-device performance. This means these powerful AI engines are designed to run locally on hardware like laptops or smartphones, rather than relying exclusively on massive, remote supercomputers. By optimizing for local execution, developers gain a significant advantage in privacy, latency, and offline capabilities.

The family includes four distinct sizes, ranging from compact 2B parameter versions to a dense 31B parameter powerhouse. What makes Gemma 4 particularly versatile is its multimodal nature. It moves beyond text-only interaction, allowing users to process visual data and audio natively. This opens doors for complex tasks such as object detection, video-based question answering, and accurate image captioning without requiring a connection to the cloud.

Technically, the team implemented specific architectural upgrades to maintain high performance in smaller packages. Innovations like the 'Shared KV Cache' minimize the memory footprint during inference, while 'Per-Layer Embeddings' allow the model to process information with greater granularity at each layer. By offering these models with an open Apache 2 license, Google is significantly lowering the barrier for developers to build sophisticated, agentic applications that run directly on consumer hardware.

Google DeepMind has just expanded its open-weights lineup with the release of Gemma 4, a suite of models specifically engineered for on-device performance. This means these powerful AI engines are designed to run locally on hardware like laptops or smartphones, rather than relying exclusively on massive, remote supercomputers. By optimizing for local execution, developers gain a significant advantage in privacy, latency, and offline capabilities.

The family includes four distinct sizes, ranging from compact 2B parameter versions to a dense 31B parameter powerhouse. What makes Gemma 4 particularly versatile is its multimodal nature. It moves beyond text-only interaction, allowing users to process visual data and audio natively. This opens doors for complex tasks such as object detection, video-based question answering, and accurate image captioning without requiring a connection to the cloud.

Technically, the team implemented specific architectural upgrades to maintain high performance in smaller packages. Innovations like the 'Shared KV Cache' minimize the memory footprint during inference, while 'Per-Layer Embeddings' allow the model to process information with greater granularity at each layer. By offering these models with an open Apache 2 license, Google is significantly lowering the barrier for developers to build sophisticated, agentic applications that run directly on consumer hardware.

Google DeepMind Launches On-Device Gemma 4 Models

Tags