Google releases Gemini 3 Flash to bridge the gap between speed and reasoning

Google’s Gemini 3 Flash is a fast, cost-effective AI model that outperforms Gemini 2.5 Pro. Now available as the default model in Search and the Gemini app.

DQI Bureau

18 Dec 2025 11:49 IST

New Update

Listen to this article

0.75x1x1.5x

00:00/ 00:00

Google has expanded its Gemini 3 model family with the release of Gemini 3 Flash, a lightweight model designed to provide high-level reasoning at faster speeds and lower costs. Launched on 17 December 2025, the model aims to eliminate the traditional tradeoff between the intelligence of flagship models and the responsiveness required for real-time applications.

Performance benchmarks and technical specs

Gemini 3 Flash significantly outperforms its predecessor, Gemini 2.5 Flash, and even surpasses the previous generation's flagship, Gemini 2.5 Pro, in most categories. According to Google, the model is three times faster than Gemini 2.5 Pro while using approximately 30% fewer tokens for the same tasks.

Key technical specifications include:

Multimodality: Native processing of text, images, audio, video, and PDF inputs.
Context Window: A 1-million-token input capacity, allowing for the analysis of roughly 700,000 words or 11 hours of audio in a single prompt.
Reasoning Capability: Scores 90.4% on the GPQA Diamond benchmark (scientific knowledge) and 81.2% on MMMU-Pro (multimodal reasoning), nearly matching the more powerful Gemini 3 Pro.

Dynamic thinking and new API features

A standout feature of Gemini 3 Flash is its Thinking Level parameter. Unlike older models that use a fixed amount of processing for every query, Gemini 3 Flash can dynamically adjust its "thinking" based on the complexity of the task.

Low/Minimal: Best for high-throughput tasks like chat or simple summaries, focusing on minimal latency.
High (Default): Maximizes reasoning depth for complex coding or agentic workflows.

The model also introduces multimodal function responses and code execution for visual inputs, which enables the AI to zoom into, count, or edit specific elements within an image.

Economic impact and developer pricing

The release is positioned as a "value disruption" in the AI market. While slightly more expensive than Gemini 2.5 Flash, it offers Pro-level intelligence at a fraction of the cost of the standard Pro tier.

Input Tokens: USD 0.50 per 1 million tokens.
Output Tokens: USD 3.00 per 1 million tokens.
Savings: Context caching can reduce costs by up to 90% for applications with repeated token use, while the Batch API offers an additional 50% discount for non-urgent tasks.

Availability

Gemini 3 Flash is now the default model for the Gemini app globally and the AI Mode in Google Search, replacing the older 2.5 Flash model for millions of users at no extra cost. Developers can access it through the Gemini API in Google AI Studio, Vertex AI, and Google’s new agentic development environment, Antigravity.

Advertisment