Gemini 3.5 Pro and Flash: What Builders Should Know
Google shipped Gemini 3.5 Flash at I/O 2026 and held Pro for June. Here's what builders should know about each tier and how to route them.
Guys, you know, I/O 2026 did something I didn’t expect from a Gemini release. The flash model shipped, the flagship didn’t, and the flash model beat last quarter’s flagship anyway. That’s the whole story of Gemini 3.5 Pro in one sentence — except Pro isn’t out yet, so half of what people are saying about it is guessing. This piece sorts the shipped facts from the announced-but-unverified ones, and tells you what to route where while you wait.
I’ve been moving production traffic between Gemini tiers since the 3.x line started, so I’ll be specific about model IDs and prices. Where I don’t know something — and there’s a lot I don’t, because Pro hasn’t landed — I’ll say so.
What Gemini 3.5 actually shipped at Google I/O 2026

Google ran the I/O keynote on May 19, 2026. Two things happened with the 3.5 family: one model went live, one got a date and nothing else.
3.5 Flash: GA on May 19 with stable API id gemini-3.5-flash
Gemini 3.5 Flash shipped generally available the same day as the keynote. It’s not a preview, not an experimental alias — the model is stable and callable as gemini-3.5-flash. That matters for anyone who got burned migrating off preview IDs before. Per the official Gemini 3.5 Flash model page from Google DeepMind, it handles understanding across text, audio, images, code, and video. It rolled out across the Gemini app, AI Mode in Search, the Gemini API in Google AI Studio, Vertex AI, and Antigravity 2.0 on launch day.
The spec sheet, for reference: 1,048,576-token input window, 65,536-token max output, January 2026 knowledge cutoff. Dynamic thinking is on by default — the model decides how much compute to spend per problem rather than waiting for you to set a budget.
3.5 Pro: announced for June, no API ID yet
Pro got a sentence on stage. Sundar Pichai said it’s in internal testing and ships “next month.” That’s June 2026. The I/O 2026 roundup from 9to5Google confirms the same framing: Pro is in testing, available next month, nothing more concrete. The live audience reportedly groaned at the delay — which tells you Pro was the part people came for.
No API model ID. No pricing. No exact date. If you’re building against Pro right now, you’re building against a press release.
Where the previous tier hierarchy got inverted
Here’s the part worth slowing down on. The old mental model was simple: Pro for hard problems, Flash for throughput. 3.5 Flash collapses that. It beats Gemini 3.1 Pro — the flagship from February 2026 — across most of the benchmark suite, while costing less and running faster. The “lightweight” tier now outperforms last generation’s premium tier.
So the question Google handed every builder is uncomfortable: does paying for a Pro model still make sense when the next-gen Flash already passed your old Pro? For a lot of workloads, the honest answer right now is no. I’ll come back to when it still does.
What Gemini 3.5 Flash brings to production

Specs are one thing. What it costs and where it actually helps is another.
Pricing and latency profile vs. 3.1 Pro
Gemini 3.5 Flash pricing is $1.50 per 1M input tokens and $9.00 per 1M output tokens on the standard tier. Cached input runs $0.15 per 1M. Google states 3.5 Flash outputs tokens roughly 4x faster than other frontier models at its tier.
One honest flag: this is not a cheap upgrade if you’re coming from Flash-Lite. Moving from the $0.25 / $1.50 Flash-Lite rate to $1.50 / $9.00 is roughly a 6x jump on the output side. You’re paying for agentic and multimodal lift, not for a drop-in cost cut. If your task is plain extraction or routing, keep it on a cheaper route. (The price went up. Pretending it didn’t would be dishonest.)
Agent and coding benchmark results
The gemini 3.5 benchmark numbers Google published, taken at face value: 76.2% on Terminal-Bench 2.1 (coding), 1656 Elo on GDPval-AA (agentic task performance), 83.6% on MCP Atlas (tool-use reliability at scale), 84.2% on CharXiv Reasoning (multimodal understanding).
Standard benchmark caveat applies: per-task results vary by workload, prompt strategy, and token mix. A leaderboard number is a starting hypothesis, not your production result. Run your own eval before you trust the headline.
Multimodal understanding (text, image, audio, video input)

Flash accepts text, image, audio, video, and PDF as input, and you can combine them in one request. The official Gemini 3.5 Flash docs in Google AI Studio cover the migration details — including that Google Search, URL context, code execution, and custom functions can run in the same call. If you were doing chain-of-thought prompt gymnastics to force reasoning, the docs say drop it and use thinking_level instead.
What it does not generate (image/video/audio output limits)
This is the line I see people get wrong most often, so read it twice. Gemini 3.5 Flash takes multimodal input and produces text output. It does not generate images. It does not generate video. It does not generate audio. Multimodal understanding is not multimodal generation.
If you need generated video, that’s Gemini Omni — a separate model family Google announced at the same event, not a 3.5 variant. Computer Use isn’t supported on 3.5 Flash either; Google says stay on Gemini 3 Flash Preview for that. Route output-generation and browser-control tasks elsewhere. 3.5 Flash is a reasoning-and-understanding engine, full stop.
What’s known and not known about Gemini 3.5 Pro

People keep asking what is Gemini 3.5 Pro going to do. Most answers online are extrapolation. Here’s the split.
Confirmed: June launch window, multimodal input
What Google actually committed to: Pro ships in June 2026, it’s in internal use now, and it sits above Flash in the 3.5 family as the deep-reasoning tier. The MacRumors I/O 2026 roundup records the same — testing internally, coming next month. That’s the confirmed set. It’s short.
Not confirmed: pricing, API ID, exact release date
Everything builders actually need to integrate is unconfirmed. No pricing. No API model ID. No specific release date beyond “June.” No published benchmark numbers for Pro specifically — anything you see comparing 3.5 Pro to other models is inference, not Google data. If a post quotes a 3.5 Pro price or a 2M-token context figure as fact, treat it as a guess wearing a confident face.
How Google’s phased rollout typically works (Ultra → Pro → free)
Based on how the 3.x line rolled out, here’s my read — flagged as a pattern, not a promise. Google tends to land higher tiers and paid surfaces first, then widen access downward over weeks. So Pro will likely appear in paid Gemini app tiers and Vertex/AI Studio paid API before it shows up on any free quota, if it reaches free at all. Whether the free tier includes Pro is genuinely unknown right now. I wouldn’t plan a free-tier Pro product around a maybe.
How builders should route 3.5 Flash vs. 3.5 Pro
You can’t route to a model that isn’t out. So this is really: what to run on Flash today, and what to hold for Pro.
When Flash is enough (latency-sensitive agent work)
For most agent and coding work, Flash is enough — that’s the whole point of the tier inversion. If your workload is multi-step tool use, coding loops, document-heavy assistants, or search-grounded pipelines, and you care about latency, 3.5 Flash covers it. The 4x output speed shows up most when you’re running long agent loops, not single calls. One fewer slow step per loop sounds small. At scale it adds up fast.
When Pro is worth waiting for (deep reasoning, long context)
Hold for Pro when the task is genuinely reasoning-bound and latency-tolerant: deep analytical chains, very long context where recall quality matters more than speed, problems where a wrong answer costs more than a slow one. I want to be careful here — I’m describing the intended role of a Pro tier, because I can’t benchmark a model I haven’t run. If Flash already clears your accuracy bar in testing, waiting for Pro buys you nothing but a bigger bill.
Fallback patterns across tiers
The pattern I’d build today: default to Flash, keep a cheaper route (Flash-Lite or 2.5 Flash) for extractive and routing tasks, and leave a config slot for Pro you can flip when it lands and after you’ve eval’d it. Don’t hardcode a single model. The 3.5 release just proved the hierarchy can flip in a quarter — your routing layer should treat model choice as a variable, not a constant.
Where Gemini 3.5 fits in a multimodal generation stack
If you’re building anything that touches image or video output, this section is the one that saves you a wrong architecture.
Decision layer vs. execution layer separation
3.5 Flash is a decision layer, not an execution layer for media. It reasons, plans, calls tools, parses inputs across modalities, and decides what should happen. It does not render the pixels or the frames. Keep those two jobs separate in your architecture: let Gemini 3.5 handle the routing, prompting, and quality judgment; let a dedicated generation model do the producing. Collapsing them is how you end up asking a text-output model to make a video and wondering why it can’t.
Pairing Gemini 3.5 with image / video generation models
The clean pattern: Gemini 3.5 ingests the brief, the reference image, the audio track — whatever the input mix is — reasons about what to generate, and emits structured instructions or prompts. A generation model downstream takes those and produces the asset.
FAQ

When is Gemini 3.5 Pro available?
June 2026, per Google’s I/O announcement. No exact date has been published yet. It remains in internal testing.
What’s the API model ID for Gemini 3.5 Flash?
gemini-3.5-flash. This is the stable, production GA identifier (live since May 19, 2026).
Does Gemini 3.5 Pro generate images or video?
Unlikely. The entire 3.5 family supports multimodal input (text, image, audio, video) but outputs text only. Image/video/audio generation belongs to separate models like Gemini Omni.
Is Gemini 3.5 Flash cheaper than 3.1 Pro?
Yes on a per-token basis ($1.50/$9.00 vs previous Pro tier), and it’s faster. However, if migrating from older Flash-Lite models, output costs rise significantly (~6x).
Can I access Gemini 3.5 through model aggregation platforms?
Yes for Flash (already available on platforms like OpenRouter at standard pricing). Pro is not yet released, so aggregation support will depend on the platform’s rollout timing.
Previous Posts:
- Gemini 3.5 Flash vs Omni Flash vs Veo vs Sora: Where Each Model Fits
- What Is Veo 4 and Why Google’s Video Stack Matters
- Google Veo 4 API Prep: What Builders Should Do Before Access Arrives
- Best AI Video Generator 2026: What Actually Holds Up in Production
- June 2026 AI Launch Wave: Why Model Tiers Are Shifting Faster Than Teams Expect