← Блог

Эта статья пока недоступна на вашем языке. Показана английская версия.

What Is Omni Flash? Capabilities, Access & Builder Guide

Google's Omni Flash launches video generation in Gemini App and Flow. What builders need to know about access, limits, and the API timeline.

9 min read
What Is Omni Flash? Capabilities, Access & Builder Guide

Hi, I’m Dora. I spent the morning of I/O 2026 reading the rollout post and pricing pages, then opening the Gemini app to see what actually shipped versus what’s still labeled “coming weeks.” This is the notes version of that — for builders and product teams deciding whether Omni Flash changes anything in their pipeline.

Short version up top: the model is real and live in consumer surfaces today. The developer API is not. That gap matters more than the demos.

What Omni Flash Actually Is (Google’s first Omni-series model)

So ​what is Omni Flash​, concretely? Gemini Omni Flash is the first model in Google’s new Omni family, announced May 19, 2026 at I/O. Google DeepMind frames it as “create anything from any input — starting with video.” The “starting with” matters — the long-term roadmap covers any-to-any modality routing, but what shipped today is multimodal input producing video ​output​. Image and audio output are on the public roadmap, not in the product.

Position in the Gemini Omni roadmap

Omni is positioned as a family, not a single model. Flash is the consumer-grade first step. A higher-tier Omni Pro has been confirmed by Google DeepMind, with no release date — Nicole Brichtova told TechCrunch that Pro arrives “when we feel like we’re at a point where we have a step change above Flash.” Read that as: not soon.

Why Google describes it as “video version of Nano Banana”

Nano Banana — Google’s image generation/editing model launched in 2025 — set the template for what Omni is trying to be for video: conversational editing, identity persistence across iterations, low friction for non-technical users. The official Google blog post introducing Omni draws the lineage explicitly. Architecturally, this Google DeepMind Omni Flash release reasons across modalities in a single forward pass rather than relaying between specialized systems. Whether that translates to meaningfully better outputs versus a Veo-plus-audio-pipeline approach is something I’ll watch. The demos are curated. Real workflows aren’t.

Capabilities Confirmed at Launch

These are the Omni Flash capabilities confirmed in the product today, not what was teased.

Multimodal input (text, image, video, audio)

You can combine any of these as inputs in a single prompt. The model treats them as a unified scene description rather than concatenated assets. This is the cleanest part of the announcement — and what distinguishes it from Veo’s text-to-video pipeline.

Up to 10-second video output with native audio

Clips cap at 10 seconds. Brichtova described this as a deployment decision, not a model ceiling — a way to control compute demand while access widens. Audio generates synchronized with video, not bolted on after. The marble-bouncing demo Google’s CTO Koray Kavukcuoglu showed reporters produced impact sounds and ring sounds automatically. Worth flagging: independent testers told TechTimes that raw generation quality may trail ByteDance’s Seedance 2.0 and Alibaba’s Wan 2.7, even if the editing layer is stronger.

Conversational editing via natural language

Each instruction builds on the last. “Make the sculpture out of bubbles” — applied, state preserved, next instruction operates on the new state. This is the workflow shift, and the part most likely to save time in production: fewer prompt rewrites, fewer re-runs from scratch.

Likeness insertion and scene consistency

The Avatar feature lets you create a digital version of yourself (onboarding requires speaking a sequence of numbers on camera — a deepfake check borrowed loosely from OpenAI’s discontinued Sora Cameos). Once stored, the avatar persists across generations.

SynthID watermarking and safety constraints

Every output carries an invisible SynthID watermark, verifiable via the Gemini app, Chrome, and Google Search. SynthID has now marked over 100 billion AI-generated images and videos. Open editing of voice and likeness is held back — Google’s stated reason is responsible deployment.

Where You Can Access It Today

Three surfaces, different ceilings.

SurfaceWho gets itCompute budget
Gemini AppAI Plus, Pro, Ultra subscribers globallyCompute-based weekly limits (new model)
Google FlowAI Plus / Pro / Ultra200 / 1,000 / 10,000–25,000 Flow credits per month
YouTube Shorts & Create AppFree usersRolling out this week

Gemini App (free tier limits)

Free users don’t get the model in the Gemini app. The free entry point is YouTube. Paid tiers start at AI Plus ($7.99/month).

Google Flow (Pro/Ultra credit allocations)

Flow is where the real workflow surfaces live — multi-clip composition, ingredient libraries, custom voices, edit-on-existing-video. The Google Flow support documentation lists features exclusive to this model: 10-second clips (vs 4s/6s/8s on lower models), uploaded-video editing, custom voice creation. Per-action credit costs vary by clip length and edit type — I’ll cover credit economics in a separate piece. For this brief, 200 credits (Plus) is exploratory; serious iteration needs Pro or higher.

YouTube Shorts and YouTube Create

The surprise distribution play. Free access to a frontier model — even constrained — is unusual. The strategic logic: OpenAI pulled Sora back to API-only earlier in 2026, leaving the consumer video space less crowded. Google is filling it with reach rather than peak quality.

What’s Not Yet Available

Developer API on Vertex AI (announced, not GA)

As of May 2026, the developer API is not generally available. Google’s blog says rollout to developers and enterprise customers via APIs is coming “in the coming weeks.” VentureBeat’s enterprise breakdown puts it directly: until Vertex API is GA, Omni is effectively a consumer and prosumer tool. If you’re scoping an integration, treat the API as a Q3 2026 planning item, not a current option.

Longer-duration generation

10 seconds is the public ceiling. Google says longer durations are in the pipeline. No timeline.

Open editing of voice and likeness

You can use your own avatar. You cannot freely edit arbitrary voices or likenesses in uploaded videos. This is a deliberate safety boundary, not a capability gap.

A few other things circulating in launch coverage that Google has not officially confirmed: a 720p output cap, 60–90 second generation times, named avatar template packs. Treat those as unverified.

How It Sits in the Video Generation Landscape

Replacement of Veo in some product surfaces

Multiple outlets have reported that Google Omni Flash effectively replaces Veo in Flow and the Gemini app. Veo is not deprecated — Veo 3.1 still has API access, and for pure text-to-video at API-grade reliability, it’s the production option today. But within Google’s own consumer surfaces, Omni is reportedly the new default. The migration story Google is selling: ship with Veo now, plan the move when GA arrives.

Conversational editing vs prompt-only generation

This is the architectural bet. Most current video models — Veo included — treat each generation as a new pass. Omni’s edits are stateful. For workflows that involve iteration (most professional ones), that changes the math on credit-per-final-clip. Whether the math actually works depends on how well the model preserves intent across edits. I haven’t tested it long enough to say.

What Builders and Product Teams Should Watch

API timing and pricing signals

The developer API is the gating factor for any production integration. Two things to monitor: the Gemini API documentation for the actual SKU appearing, and the Vertex AI pricing page for per-token or per-second billing structure. Token-based pricing — Google’s standard for the Gemini family — would make this easier to forecast than per-clip pricing.

Likely arrival on aggregation platforms

Once the API lands, expect the model to show up on unified-access platforms within weeks. If you’re already integrated against a multi-model API layer, migration cost from Veo 3.1 should be small. If you’re directly integrated to a single provider, the case for adding an aggregation layer gets stronger every quarter — this launch is one more data point in that direction.

FAQ

Is the Omni Flash API available for developers yet?

No. As of May 2026, the developer API is not generally available. Google says rollout via Gemini API and Vertex AI is coming “in the coming weeks.” Until then, programmatic access is not possible.

What’s the maximum video length Omni Flash can generate?

10 seconds. Google DeepMind has stated this is a deployment decision rather than a model architectural limit. Longer durations are planned without a public timeline.

Does Omni Flash replace Google’s Veo model entirely?

No. Veo 3.1 remains available with API access for text-to-video workloads. Within Google’s own consumer surfaces (Gemini app, Flow), the new model is reportedly the default. For production API integrations today, Veo is the working option.

Can I use Omni Flash output commercially?

Subject to Google’s Generative AI Prohibited Use Policy and your subscription tier terms. Commercial use is generally permitted within paid tiers, but specific scenarios (likeness-bearing content, third-party IP, regulated industries) need verification against current Google policy. Don’t take a blanket yes from anyone.

Does Omni Flash watermark every generated video?

Yes. All outputs carry an imperceptible SynthID watermark, verifiable through the Gemini app, Chrome, and Google Search. There is no opt-out.

Is Omni Flash available outside Google’s own apps?

Not yet. Current access is limited to the Gemini app, Google Flow, YouTube Shorts, and the YouTube Create app. Once the developer API ships, expect availability through Vertex AI and likely on third-party aggregation platforms shortly after.

Bottom Line

For most product teams, the practical answer this week is: nothing changes yet. Keep shipping with Veo 3.1. The decision point is the API GA — when it lands, the conversational-editing primitive is worth a real evaluation, especially if your pipeline already pays the cost of multi-pass video generation.

For consumer experimentation, Gemini app and Flow are the entry points on paid tiers; YouTube Shorts is the free path. Worth half an hour of hands-on time to calibrate your own quality expectations against the demos.

One disambiguation note: this is Google’s Gemini Omni Flash. There’s a separately named Qwen3.5-Omni-Flash from Alibaba — different vendor, different roadmap. Don’t conflate them.

That’s what I have today. I’ll revisit when the API ships.

Previous posts:

Поделиться