Gemini Omni Demos Just Leaked — Here's What Google's New Video Model Actually Does
Eight days after the original UI-string leak, the first Gemini Omni sample videos surfaced. Strong on chat-driven editing, behind Seedance 2.0 on raw fidelity, and burning ~43% of an AI Pro daily quota per clip. Here's the honest read a week before I/O 2026.
When we wrote about the initial Omni leak on May 3, the entire story was a single UI string. Eight days later, the picture has filled in considerably. The Gemini mobile app surfaced actual sample videos generated by the model, the internal model ID leaked (bard_eac_video_generation_omni), and enough hands-on impressions are now public to make some early calls.
The short version: Omni is real, it’s almost certainly a new model rather than a Veo 3.1 rename, and on the dimensions that matter to people building AI video products — fidelity, editing, cost — it has very different strengths and weaknesses than the leaderboard leaders. Seven days before Google I/O 2026 (May 19–20), here’s what’s now known.
What surfaced this week
On May 11, 2026, TestingCatalog and X user @Thomas16937378 pulled fresh samples out of the Gemini mobile app’s video generation flow. The model card text moved on from a placeholder (“Powered by Omni”) to a full product description:
Create with Gemini Omni: meet our new video model. Remix your videos, edit directly in chat, try a template, and more.
Three concrete details came with it:
- Internal model ID:
bard_eac_video_generation_omni. “Bard EAC” is the Gemini app’s internal namespace for experimental features; the_omnisuffix confirms this is treated as a distinct model rather than a Veo variant. - 10-second cap on generated clips at the current preview tier. Veo 3.1 caps at 8s natively and 16s with extend; Omni currently sits between them with no extend pathway visible yet.
- New usage-limits tab in Gemini settings, indicating a credit-metered rollout rather than a per-month subscription quota — consistent with how Google has been releasing higher-cost agentic features (Deep Research, Notebook Plus).
That’s a meaningful upgrade in evidence quality. The May 3 leak was UI text alone. This is UI text + working endpoint + observable outputs + a billing surface.
The two sample videos people have seen
Both samples came from the Gemini app, both from users with AI Pro access who were able to invoke the model before a presumed rollback. Worth describing in detail because they tell you which model lineage Omni belongs to.
Sample 1 — “A professor writing a mathematical proof for trigonometric identities on a traditional chalkboard.” Reviewers called the text rendering “remarkably well” handled — the chalk equations were legible and looked mathematically plausible rather than the symbol soup that earlier video models produced. Hand and arm motion read as natural. The chromeunboxed write-up still flagged “obvious AI tells in the final output” without specifying which — likely some combination of unnatural microsaccades, hand mesh artifacts, and slightly drifting chalk geometry.
Sample 2 — “Two men eating spaghetti at an upscale restaurant.” Described as “fairly realistic.” The pasta-twirling test has been an informal benchmark for a year now because it stresses everything that goes wrong in latent-space video: utensil-food contact, fluid-like motion, and consistent face identity through occlusion. Omni handled it well enough to comment on, but again with the qualifier that the floor for “passable” has risen this year — Seedance 2.0 and Wan 2.7 both clear that bar reliably.
Two samples is not a benchmark. But two samples in two different difficulty regimes (text-in-frame and contact physics), both with reviewers noting strong-but-not-flawless results, are enough to place Omni in the same tier as Veo 3.1 — not above it on raw fidelity, and clearly below Seedance 2.0.
Where Omni actually leads: chat-driven editing
The interesting result from the week’s hands-on coverage is that Omni’s standout capability isn’t generation quality. It’s editing. Specifically:
- Watermark removal from input clips, performed via natural-language chat instructions
- Object replacement within a scene (“swap the red car for a blue one”)
- Scene rewrites through conversational turn-taking — describe what should change, the model returns an edited version, iterate
This is a meaningfully different surface area than what Seedance 2.0 Video-Edit or Wan 2.7 Edit currently expose. Those models are excellent at command-style instruction edits (“remove the earphones,” “change the woman’s coat to red”) but they don’t sustain a multi-turn editing conversation against a single source clip. The closest analogue today is Kling Omni Video O1’s natural-language edit flow, which we wrote about in detail when it shipped.
If Omni does ship as a chat-first video editor — not just another text-to-video endpoint — that’s the unique-value-proposition story. Google has the LLM stack to make multi-turn correction work natively in a way most pure video-model vendors don’t.
The cost story
The single most striking data point: one tester reported that two video prompts consumed 86% of their daily AI Pro quota. That’s roughly 43% of a Pro day per clip — a cost profile in line with frontier video models, not Flash-tier image generation.
A few implications:
- The preview model running in the Gemini app is almost certainly the Pro/full tier, not Flash. TestingCatalog speculates a Flash variant will land alongside, but the samples we’ve seen aren’t from it.
- Per-clip credit burn at this rate maps to something like $0.30–$0.50 per 10s clip in retail equivalence, which is competitive with Veo 3.1 ($0.50/s at preview pricing) but pricier than Seedance 2.0 Fast.
- Google will almost certainly introduce explicit usage tiers at the I/O reveal — the new usage-limits tab is a tell. Expect a flash-cost tier for casual users and a metered pay-as-you-go tier in AI Studio for builders.
What we now think Omni actually is
Three weeks ago there were three plausible readings: Veo rebrand, separate Gemini video model, or full omni-modality model. The May 11 evidence narrows that:
- Separate model ID (
_omnisuffix, not_veo) rules out a straight Veo rebrand. Google doesn’t usually rename existing model endpoints during preview rollouts. - Editing-first product framing — “remix, edit directly in chat” — is not the language Google has used for Veo, which has always been pitched as text-to-video + extend. This reads more like a separate model with a different training objective.
- No image-output evidence in any leaked sample. If this were the unified omni-modality model that name suggests, you’d expect to see image generation surface from the same endpoint. So far, every leak has been video-only.
Most-likely read at this point: Omni is a new Gemini-trained video model, sitting alongside Veo rather than replacing it, with an editing-first product positioning. Nano Banana shows Google is willing to brand-separate within the same modality (text-to-image runs under both Nano Banana and Gemini 3 Flash Image names). Omni-and-Veo coexisting parallels that pattern.
The fully unified omni-modality dream that the name suggests is probably still a future generation. What’s shipping next week — if it ships next week — is a competitive video editor with Google’s LLM-native chat surface bolted on.
What this changes for evaluation
If you’re building anything that touches AI video, three things shift in the next two weeks:
- Add an editing benchmark to your eval suite. Most video model evals are text-to-video only. If Omni’s pitch is chat-driven editing, your comparison can’t be just generation fidelity — you need a battery of “edit this clip” prompts that test multi-turn coherence, object identity preservation through edits, and instruction adherence in the second and third turns.
- Treat the Seedance 2.0 / Wan 2.7 / Omni triangle as the working set. Sora 2 and Veo 3.1 are now best understood as previous-generation references against this triangle. Each of the three has a distinct strength: Seedance leads on fidelity, Wan leads on multi-modal reference inputs, Omni (provisionally) leads on chat editing.
- Budget for Pro-tier pricing. The 43%-of-daily-quota data point is the loudest signal of the week. If your workflow involves generating clips at scale, the Flash-tier release will matter more than the Pro tier. Track that announcement specifically.
The week ahead
Google I/O opens May 19, 2026. The Tuesday keynote slot is where Gemini and DeepMind announcements traditionally land. A pre-keynote leak this controlled, this complete — model card text, sample videos, billing surface, all in one week — is consistent with a launch that’s already cleared internal review and is waiting on the calendar.
The four things to watch on the day:
- Is there a Flash tier, and what does it cost?
- Is the editing pitch real, or was that one-sample noise? Specifically, does Google show multi-turn editing live on stage?
- What’s the API path? AI Studio? Vertex? Both?
- Audio sync: none of the leaked samples address whether Omni generates synchronized audio the way Veo 3.1 does. If it doesn’t, that’s a real gap.
Try the current alternatives on WaveSpeedAI
Until Omni ships, the rest of the 2026 video-gen field is live on WaveSpeedAI under one API:
- Seedance 2.0 — current SOTA on raw fidelity, with Fast variants for low-latency
- Wan 2.7 — Alibaba’s reference-rich video model
- Kling V3.0 Pro — Kuaishou’s high-fidelity option
- Kling Omni Video O1 Edit — natural-language video editing, the closest current analogue to what Omni is being pitched as
- Sora 2 — OpenAI’s offering
- Veo 3.1 — current Google video model
When Gemini Omni lands publicly, expect to compare it under the same API within days.
Sources: TestingCatalog, 9to5Google, Chrome Unboxed, OfficeChai.
