Nano Banana 2 Leak: A Glimpse Into Google's Next-Gen AI Image Model
A few months ago, Nano Banana became known for creating hyper-realistic AI figures with collectible-style aesthetics. Now, it is back in the spotlight — this time for an unexpected reason.
On November 10, an early preview build of Google’s next-generation image model, Nano Banana 2 (NB 2.0), briefly appeared on the third-party platform Media.io. The build was removed within hours, but that was long enough for screenshots and test results to circulate widely online.
The short-lived leak has already sparked intense discussion across the AI community. So what did people actually see, and how far does Nano Banana 2 push the boundaries of generative imaging?
First Impressions from the Leak
Users who managed to test the model before it was taken down shared a series of eye-catching examples. Although unofficial, these early results suggest a model with a much deeper understanding of light, material, and context.
”AI that Understands Physics”
Two early benchmarks, informally dubbed the “Wine Glass Test” and the “Glass Burger Challenge,” demonstrated how precisely Nano Banana 2 can handle transparency and refraction.
In the wine glass example, the refraction angle of light through glass and liquid was reported to deviate by less than three degrees — an impressive level of physical realism for a generative model. The “Glass Burger” test pushed similar boundaries, combining transparency, reflection, and realistic surface texture in a single image. Another demo, the “Pink Ocean,” showcased accurate color diffusion and light reflection across a stylized water surface.
Faster Generation and High-Fidelity Text
Speed appears to be one of the model’s strong suits: complex 4K scenes reportedly rendered in around 10 seconds.
More surprising is the accuracy of text rendering. Early testers claim Nano Banana 2 can generate full UI mockups, complete with readable menus, URLs, and even timestamp overlays — tasks that have traditionally challenged diffusion-based models.



Logical and Mathematical Reasoning
Perhaps the most intriguing capability shown in the leaked tests was visual reasoning. Given a photo of a handwritten math problem, Nano Banana 2 could not only interpret the question but also generate a step-by-step derivation as if written on a digital whiteboard.

This hints at a more integrated multimodal understanding — the ability to combine text, math, and image reasoning in one output.
Comparing Nano Banana 1 and 2: From Visual Realism to Cognitive Coherence
To understand the scale of the upgrade, let us look at side-by-side comparisons between Nano Banana (V1) and Nano Banana 2 (V2) across several categories.
Prompt Fidelity
Prompt: “Have the girl turn around.”

While the first model could adjust pose, it often lost the original art style. In contrast, Nano Banana 2 preserved the source’s cel-shaded aesthetic and line work while performing the transformation accurately. The result feels more like a true edit than a re-creation.
Physical Consistency
Prompt: “Passed the clock & wine glass benchmark flawlessly — 11:15 on the clock, wine glass filled to the brim.”

V2 followed the prompt almost literally, with correct lighting, time, and reflections. V1 captured the general scene but missed key details — a sign of the older model’s more limited scene understanding.
Text Rendering and UI Simulation


When asked to generate a screenshot of a Windows 11 desktop showing DeepMind’s Gemini 3 webpage, Nano Banana 2 produced a layout nearly indistinguishable from an actual browser screenshot. The text, icons, and interface elements were all sharp and legible.
By comparison, V1 rendered the same prompt with distorted or unreadable text — a common limitation of earlier diffusion models.
Visual Reasoning
Prompt: “Solve this question and show step-by-step derivation.”

Here, the improvement goes beyond visual quality. V1’s solution appeared logical but was mathematically incorrect due to transcription errors. V2, however, correctly interpreted the problem and derived the right answer — a glimpse of genuine symbolic reasoning in a visual model.
WaveSpeedAI Confirms Integration
The leaked preview on Media.io has since been officially closed, but the model’s future release is already on the horizon.
WaveSpeedAI has confirmed plans to integrate Nano Banana 2 once it becomes publicly available. Early access will be provided through a whitelist program for testing and feedback.
In the meantime, users can still explore Nano Banana (V1) directly through WaveSpeedAI’s platform — a good way to appreciate how far the model has come before V2’s official debut.
Final Thoughts
If the leaked results are authentic, Nano Banana 2 represents more than just an incremental upgrade — it points toward a new phase of AI image modeling where visual reasoning, physics simulation, and multimodal understanding converge.
Whether the final release matches these early impressions remains to be seen, but one thing is clear: the next generation of AI image synthesis is arriving faster, and smarter, than anyone expected.
Stay Connected with Us
Discord Community | X (Twitter) | Open Source Projects | Instagram
© 2025 WaveSpeedAI. All rights reserved.
