Gemini 3.5 Flash Costs 5.5x More Than Its Predecessor

Manaal KhanMay 20, 2026 at 4:43 PM5 min read

Key Takeaways

Gemini 3.5 Flash costs 5.5x more to run than its predecessor in benchmark testing
Token prices tripled: $1.50/$9.00 per million tokens, up from $0.50/$3.00
High token consumption on agent tasks pushes total costs 75% above Gemini 3.1 Pro

Google DeepMind has released Gemini 3.5 Flash, and the price tag is raising eyebrows. The new model costs 5.5 times more to run than Gemini 3 Flash in benchmark testing, according to an analysis by Artificial Analysis. That makes it nearly twice as expensive as the Pro model Gemini 3.1.

The sticker shock gets worse when you look at agent-based tasks. Gemini 3.5 Flash burns through so many tokens that total costs end up 75 percent higher than Gemini 3.1 Pro, the model it was supposed to undercut.

5.5x

Cost increase of Gemini 3.5 Flash over Gemini 3 Flash in benchmark testing

Token Prices Triple Across the Board

Google now charges $1.50 per million input tokens and $9.00 per million output tokens for Gemini 3.5 Flash. That's up from $0.50 and $3.00 for Gemini 3 Flash. A straight tripling.

On paper, that's still cheaper per token than Gemini 3.1 Pro at $2.00 and $12.00. But the math flips in practice. Gemini 3.5 Flash consumes far more tokens to complete agent-based tasks, wiping out the per-token savings.

The context window stays at one million tokens. So developers get the same input capacity but pay more for what comes out.

Google Joins an Industry-Wide Price Hike

Google isn't alone in making newer models pricier. Anthropic's Opus 4.7 had a hidden price increase of roughly 30 to 40 percent over its predecessor due to higher token consumption. OpenAI's GPT 5.5 jumped 50 to 90 percent over GPT 5.4. In OpenAI's case, token consumption went down but base prices went up. Google raised both.

For developers and companies, this means raw token price is becoming a less useful metric on its own. What matters now is efficiency: how many tokens a model actually needs to finish a job. A cheaper per-token rate means nothing if the model uses three times as many tokens.

Performance Gains Come With a Catch

Gemini 3.5 Flash does deliver meaningful improvements. It scores 55 on the Artificial Analysis Intelligence Index, nine points above Gemini 3 Flash. That puts it ahead of Grok 4.3 (53) and Claude Sonnet 4.6 (52).

The model pushes out more than 280 output tokens per second, making it the fastest in its intelligence class. Gains show up across nearly every category tested, with the strongest improvements in agentic and multimodal tasks.

But there's a notable weakness. In programming tasks, Gemini 3.5 Flash falls clearly behind competitors like GPT-5.5 and Claude Opus 4.7. For teams building coding assistants or developer tools, this could be a deal-breaker.

Hallucinations Remain a Problem

Despite the intelligence gains, Gemini 3.5 Flash still struggles with hallucinations. The model shows improvement in knowledge accuracy, but factual reliability remains an issue that developers will need to account for in production systems.

Gemini 3.5 Flash verbessert sich deutlich bei Wissensgenauigkeit und Halluzinationsreduktion, liegt aber bei der Halluzinationsrate noch hinter Spitzenmodellen. | Bild: Artificial Analysis

What This Means for Your AI Budget

How much the price hike stings depends on your application. Simple query-response use cases might absorb the increase without much pain. But agent-based workflows, the kind that loop through multiple reasoning steps, will see costs balloon.

The Flash line was long positioned as the cheaper, faster alternative to Google's Pro models. That value proposition is now murkier. Teams will need to benchmark their specific use cases rather than relying on published per-token rates.

Logicity's Take

Frequently Asked Questions

How much more expensive is Gemini 3.5 Flash than Gemini 3 Flash?

Gemini 3.5 Flash costs 5.5 times more to run in benchmark testing. Token prices tripled from $0.50/$3.00 to $1.50/$9.00 per million input/output tokens.

Is Gemini 3.5 Flash cheaper than Gemini Pro?

Per token, yes. But in practice, Gemini 3.5 Flash uses so many more tokens on agent tasks that total costs end up 75% higher than Gemini 3.1 Pro.

What is Gemini 3.5 Flash best at?

Agentic and multimodal tasks. It scores 55 on the Artificial Analysis Intelligence Index and outputs 280+ tokens per second. However, it lags behind competitors in programming tasks.

Are other AI companies raising prices too?

Yes. Anthropic's Opus 4.7 increased costs 30-40% through higher token consumption. OpenAI's GPT 5.5 jumped 50-90% over GPT 5.4 through higher base prices.

What is the context window for Gemini 3.5 Flash?

One million tokens, unchanged from the previous version.

ℹ️

Need Help Implementing This?

Source: The Decoder / Matthias Bastian

Google I/O 2026: Gemini Omni, Spark Assistant, and XR Hardware

The new article introduces several major announcements from Google I/O 2026 not mentioned in the first piece, including the 'Gemini Omni' model and the 'Spark' agentic personal assistant. It also covers broader developments such as the AI integration in Google Search and the debut of Android XR glasses.

Jeff Bezos is closing a $10 billion funding round for Project Prometheus, an AI lab focused on physics-based AI for manufacturing and engineering. With a $38 billion valuation and backing from JPMorgan and BlackRock, this signals a major shift in enterprise AI investment toward industrial applications.

21 Apr 2026

AI & Machine Learning·7 min

Kimi K2.6 Open-Weight AI: 300 Agents at a Fraction of the Cost

Moonshot AI's Kimi K2.6 matches GPT-5.4 and Claude Opus 4.6 on coding benchmarks while running 300 parallel agents. For businesses locked into expensive API contracts, this open-weight model could slash AI infrastructure costs while delivering enterprise-grade automation.

20 Apr 2026

AI & Machine Learning·7 min

AI Vendor Lock-In Risk: Anthropic Suspensions Hit Fintech

A Latin American fintech lost access to 60+ Claude accounts overnight with no warning, exposing dangerous single-vendor dependencies. The incident offers critical lessons for any business building AI into core operations.

20 Apr 2026

Also Read

Boring Company seeks $4B at $20B valuation, per WSJ

Startups & Innovation·6 min