Nano Banana 2 Leak: A Glimpse Into Google's Next-Gen AI Image Model

WaveSpeedAI,

A few months ago, Nano Banana became known for creating hyper-realistic AI figures with collectible-style aesthetics. Now, it is back in the spotlight — this time for an unexpected reason.

On November 10, an early preview build of Google’s next-generation image model, Nano Banana 2 (NB 2.0), briefly appeared on the third-party platform Media.io. The build was removed within hours, but that was long enough for screenshots and test results to circulate widely online.

The short-lived leak has already sparked intense discussion across the AI community. So what did people actually see, and how far does Nano Banana 2 push the boundaries of generative imaging?

First Impressions from the Leak

Users who managed to test the model before it was taken down shared a series of eye-catching examples. Although unofficial, these early results suggest a model with a much deeper understanding of light, material, and context.

”AI that Understands Physics”

Two early benchmarks, informally dubbed the “Wine Glass Test” and the “Glass Burger Challenge,” demonstrated how precisely Nano Banana 2 can handle transparency and refraction.

In the wine glass example, the refraction angle of light through glass and liquid was reported to deviate by less than three degrees — an impressive level of physical realism for a generative model. The “Glass Burger” test pushed similar boundaries, combining transparency, reflection, and realistic surface texture in a single image. Another demo, the “Pink Ocean,” showcased accurate color diffusion and light reflection across a stylized water surface.

Wine glass and clock benchmark
Wine glass and clock test
Glass burger benchmark
Glass burger
Pink Ocean benchmark
Pink Ocean

Faster Generation and High-Fidelity Text

Speed appears to be one of the model’s strong suits: complex 4K scenes reportedly rendered in around 10 seconds.

More surprising is the accuracy of text rendering. Early testers claim Nano Banana 2 can generate full UI mockups, complete with readable menus, URLs, and even timestamp overlays — tasks that have traditionally challenged diffusion-based models.

Precision Comic Translation
Precision Comic Translation
AI-generated browser interface
AI-generated browser interface
AI-generated human portraits and surveillance footage
AI-generated human portraits and surveillance footage

Logical and Mathematical Reasoning

Perhaps the most intriguing capability shown in the leaked tests was visual reasoning. Given a photo of a handwritten math problem, Nano Banana 2 could not only interpret the question but also generate a step-by-step derivation as if written on a digital whiteboard.

Visual math reasoning demo
Visual math reasoning demo

This hints at a more integrated multimodal understanding — the ability to combine text, math, and image reasoning in one output.

Comparing Nano Banana 1 and 2: From Visual Realism to Cognitive Coherence

To understand the scale of the upgrade, let us look at side-by-side comparisons between Nano Banana (V1) and Nano Banana 2 (V2) across several categories.

Prompt Fidelity

Prompt: “Have the girl turn around.”

Prompt fidelity comparison
(From left to right)Original image, Nano Banana, Nano Banana 2

While the first model could adjust pose, it often lost the original art style. In contrast, Nano Banana 2 preserved the source’s cel-shaded aesthetic and line work while performing the transformation accurately. The result feels more like a true edit than a re-creation.

Physical Consistency

Prompt: “Passed the clock & wine glass benchmark flawlessly — 11:15 on the clock, wine glass filled to the brim.”

Physical consistency comparison
(From left to right) Nano Banana, Nano Banana 2

V2 followed the prompt almost literally, with correct lighting, time, and reflections. V1 captured the general scene but missed key details — a sign of the older model’s more limited scene understanding.

Text Rendering and UI Simulation

Nano Banana V1 UI attempt
Nano Banana (V1)
Nano Banana V2 UI attempt
Nano Banana 2 (V2)

When asked to generate a screenshot of a Windows 11 desktop showing DeepMind’s Gemini 3 webpage, Nano Banana 2 produced a layout nearly indistinguishable from an actual browser screenshot. The text, icons, and interface elements were all sharp and legible.

By comparison, V1 rendered the same prompt with distorted or unreadable text — a common limitation of earlier diffusion models.

Visual Reasoning

Prompt: “Solve this question and show step-by-step derivation.”

Visual reasoning comparison
(From left to right) Original image, Nano Banana, Nano Banana 2

Here, the improvement goes beyond visual quality. V1’s solution appeared logical but was mathematically incorrect due to transcription errors. V2, however, correctly interpreted the problem and derived the right answer — a glimpse of genuine symbolic reasoning in a visual model.

WaveSpeedAI Confirms Integration

The leaked preview on Media.io has since been officially closed, but the model’s future release is already on the horizon.

WaveSpeedAI has confirmed plans to integrate Nano Banana 2 once it becomes publicly available. Early access will be provided through a whitelist program for testing and feedback.

In the meantime, users can still explore Nano Banana (V1) directly through WaveSpeedAI’s platform — a good way to appreciate how far the model has come before V2’s official debut.

Final Thoughts

If the leaked results are authentic, Nano Banana 2 represents more than just an incremental upgrade — it points toward a new phase of AI image modeling where visual reasoning, physics simulation, and multimodal understanding converge.

Whether the final release matches these early impressions remains to be seen, but one thing is clear: the next generation of AI image synthesis is arriving faster, and smarter, than anyone expected.

Stay Connected with Us

Discord Community | X (Twitter) | Open Source Projects | Instagram

© 2025 WaveSpeedAI. All rights reserved.