VEO 3.1 is coming: perfect consistency

SORA 2.0 will come soon.

While waiting for the VEO 3.1 Model. We recommend you to use Veo 3.1’s compatible model which is Sora 2.

Experience Sora 2 for now Check doc

Have a try on sora 2

Text to Video

Image to Video

Text to Video Pro

Image to Video Pro

Prompt

Create

Possible features for Veo 3.1

1080p Parity in Portrait

1080p is already supported (with caveats), and vertical (9:16) landed recently; a minor rev may unify 1080p quality across formats and clean up edge-case limits

Comenzar

Tighter Audio–Lip-Sync & Mix Controls

Veo 3 already does native audio; a point release could improve speech sync, ambience balance, and per-track levels without a model re-architecture.

Comenzar

Storyboard / Shot-List Controls

More explicit multi-shot prompts or timeline handles (scene beats, transitions) feel like a 0.1-level control feature rather than a major model jump. (Rumor class; watch API docs.)

Comenzar

Longer Clips (10–12s vs 8s)

A small duration bump is a classic “.1” change. Today, Veo 3 is documented at ~8-second clips; 3.1 could raise this modestly.

Comenzar

Sora 2 Key Features

A Smarter World Simulator

Sora 2 understands how the real world works.Unlike older models that bend reality, it obeys physics — when a player misses, the ball rebounds naturally, not magically teleports. In addition, the model marks a major advance in control, capable of following complex, multi-shot directions while maintaining a consistent and coherent world state.

Comenzar

Prompt

A gymnast flips on a balance beam.

Final outcome

Super Physical Accuracy and reflect the real world

Traditional AI video models often fall short when it comes to motion realism — think warped hands, physics-defying limbs, and floating objects. Sora 2 changes that. It delivers strikingly realistic motion, accurately simulating the way people move, how objects interact, and how momentum carries through a scene. No floating, no glitches — just physics that feels real.

Comenzar

Prompt

A man does a backflip on a paddleboard.

A dalmatian deftly walks runs and hops his way through a complex obstacle course in burano italy.

A man rides a horse which is on another horse.

In the style of a studio ghibli anime, a boy and his dog run up a grassy scenic mountain with gorgeous clouds, overlooking a village in the distant background.

Image

Video Meets Sound — True Multimodal Generation

Sora 2 doesn’t stop at visuals — it listens too. It generates synchronized video and audio，according to dialogue, ambient sound, and music.Using only natural and simple text prompts, users can generate multi-scene stories with full control over camera movement, lighting, and transitions. In addition, it can support sophisticated background and sound.In the next case,Let's compare with other model, looking the newest effects.

Comenzar

Prompt

underwater scuba diver, sounds of the coral reef.

Two mountain explorers in bright technical shells, ice crusted faces, eyes narrowed with urgency shout in the snow, one at a time

Image

Use cases

Movie Trailer: Vikings Go To War — North Sea Launch (10.0s, Winter cool daylight / early medieval)...

Customize

2D Animation: In the style of a Japanese anime, the hero with white hair awakens his dormant powers. his body is enveloped in a blue and black fiery aura and markings grow to cover his face and body, a deep ancient power finally awakened...Ghibli and Makoto Shinkai art styles.

Customize

Clay animation: A claymation conductor conducts a claymation orchestra

Customize

Lecture: old professor talks in english then german

Customize

Documentary: Underwater scuba diver, sounds of the coral reef

Customize

Sports shorts: skateboarder does a kickflip

Customize

Q&A

What is Sora 2? What’s the most fundamental change vs the previous generation?

Sora 2 is OpenAI’s latest video+audio generation model. It emphasizes more accurate physical behavior, higher realism, and stronger controllability, and for the first time synchronizes dialogue/SFX with visuals by default. That’s a real shift from the previous “silent video” paradigm.

Can Sora 2 output videos with dialogue right away?

Yes. Audio and video are generated together. If you include lines/SFX/ambience in your prompt, the model will attempt to realize them.

What inputs are supported? Can I do image-to-video or video-to-video?

Prompts can start from text or images.

How does Sora 2 Pro differ from the regular version?

Pro targets higher fidelity and harder shots, with potentially longer generation times. ChatGPT Pro and API users will be onboarded gradually.