Home/Blog/Veo 3.1: OpenAI's Sora 2 Rival Is Coming

Veo 3.1: OpenAI's Sora 2 Rival Is Coming

Veo 3 opened a new chapter in AI video generation — when videos with sound became possible, not just imagined. After the launch of OpenAI’s Sora 2, Google is moving fast with its next step. Veo 3.1 is now available on WaveSpeedAI — featuring reference-based video generation, smooth frame interpolation, and high-resolution 1080p output, enabling creators to produce more consistent, lifelike videos with synchronized sound.

What Is Veo?

Veo is Google’s family of AI video models that turn text or images into short videos with sound — including music, ambient noise, and dialogue. There are two versions of Veo 3: Veo 3 (Standard) – for high-quality, cinematic results. Veo 3 Fast – optimized for faster generation and testing.

What’s New in Veo 3.1

Compared with Veo 3, the 3.1 update represents a foundation model upgrade — combining higher-fidelity visual realism with context-aware, synchronized audio generation.

It’s the closest yet to a true “text-to-scene” filmmaking engine.

Smarter Visual-Audio Fusion

The new foundation model in Veo 3.1 brings video and audio reasoning closer than ever.

Prompts:cinematic POV video, hyper-realistic, 8k, a thrilling first-person ride on a vintage wooden roller coaster in Japan, front row seat, completely unobstructed view. The scene is set at golden hour sunset, casting dramatic, warm light. In the distance, a majestic snow-capped Mount Fuji ……

In Veo 3.0, a roller-coaster scene looked smooth but ‘silent’ inside — the tension just wasn’t there.

Now, Veo 3.1 captures every scream, rush of wind, and metallic rattle in perfect sync with motion, pulling you right into the ride.

Subject-Referenced Generation (R2V): Keep Faces and Objects

Unlike Veo 3.0, the new Veo 3.1 allows you to upload 1–3 reference images, enabling the model to preserve visual consistency across every frame.

It keeps faces, movements, and environments aligned, eliminating character drift or awkward transitions over longer clips.

Prompts:A man with a beard, wearing a beanie and safety glasses, is drilling into a wooden wall. The drill bit has just broken through the wall, revealing a vibrant, sunlit field of blooming wildflowers on the other side. The man paused drilling, his expression transformed into one of awe and delight. He has released the drill and is now standing with his arms outstretched, facing the beautiful flower field, as if embracing the new world he has just uncovered. The light from the flower field illuminates his face and the edges of the wall.

Frame-Controlled Generation: Start, End, and Everything Between

You can now lock in your first and final frames, and Veo 3.1 will smoothly fill everything in between.

Prompts:A young man in a sharp grey suit, carrying a brown leather briefcase, is confidently walking down a sunlit city street with classic architecture……

Who Is Veo 3.1 For?

  • 🎥 Digital Presenters & Avatars: Corporate training, news, and entertainment.
  • 🤖 Customer Service Agents: Realistic, conversational video responses.
  • 📚 Education & E-learning: Delivering long-form lecture content.
  • 🌍 Content Localization: Scalable dubbing with precise lip-sync.

Try Veo 3.1 on WaveSpeedAI Today

Start creating with Veo3.1 on WaveSpeedAI now! Visit the playground, upload an image of your choice, enter your text, and click Generate. In just a few seconds, your talking video will be ready for editing.

🔗text-to-video
🔗text-to-video-fast
🔗image-to-video
🔗image-to-video-fast
🔗reference-to-video

Stay Connected

Join Discord Community | Follow us on X (Twitter) | Open Source Projects

© 2025 WaveSpeedAI. All rights reserved.