Veo3 | Powerful Text-to-Video API

google /

Google Veo3 is Google's flagship text-to-video model with built-in audio, producing synchronized video and sound from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

Input

Enable Safety Checker

Idle

$3.2per run

ExamplesView all

A breaking news ident, followed by a TV news presenter excitedly telling us: We interrupt this programme to bring you some breaking news... Veo 3 is now live on WaveSpeedAI. Then she shouts: Let's go! The TV presenter is an epic and cool punk with pink and green hair and a t-shirt that says 'Veo 3 on WaveSpeedAI'

A lone astronaut in a dark silver suit walks through a narrow corridor inside a damaged space station. Sparks fall from the ceiling, flickering lights cast dynamic shadows on the metallic walls. The camera slowly dollies forward, then cuts to a side close-up as the astronaut turns their head, revealing a cracked helmet visor. A subtle glow from Earth is visible through a distant window.

A man walking alone on a forest trail in autumn, leaves crunching underfoot. Birds chirp in the distance. He murmurs: "Feels good to be away from everything for a while."

Two best friends lying in a grassy field under the stars, pointing at constellations. One says: "Do you ever think about how big the universe is?" The other replies: "All the time... and it still amazes me."

A detective and a witness talking in a quiet diner at night. Rain falls outside. The detective asks: "What did you see that night?" The witness hesitates: "I saw someone... but not their face."

A teenage girl sketching in a notebook at a café window, rain tapping on the glass. Lo-fi music plays faintly. She whispers: "I just want to draw forever..."

Two lovers sitting at a riverside bench, sun setting behind them. She asks: "Would you still love me if we never met again?" He responds: "I'd find you in every lifetime."

A table covered with colorful fruits. A slow, precise hand picks up a kiwi, peels it with a small knife. Subtle slicing sounds. Whispered voice: "Today we're doing kiwi and starfruit... hear that soft skin peeling?"

An alchemist in a glowing lab pouring colored liquids into flasks, small pops and fizzing sounds. He mutters: "A drop of dragonroot... two of nightshade... let's see what happens..."

A close-up of hands gently slicing a ripe mango on a bamboo cutting board. The knife glides through the soft flesh with a wet, sticky sound. Juice slowly pools on the surface. The room is silent except for the smooth, squishy slice.

A chilled green apple being sliced into paper-thin wedges. Each cut makes a sharp, clean snap against the wooden board. Ambient sound of a quiet room, no other noise.

A wide-angle shot of a cyberpunk city at dusk, neon pink and cyan holographic billboards glitching, hovercars zipping through smog with magenta exhaust trails. A rogue hacker with a cybernetic arm taps a holographic keyboard, rain dripping off their hood. Cinematic lighting, shallow focus on the hacker’s glowing arm, dolly shot moving forward. Ambient audio: synthwave music, hovercar hums, rain patter. Dialogue: ‘We’re close to breaking the mainframe…

A two-shot of a young girl and her fluffy golden retriever sitting on a blanket in a meadow, wildflowers blooming around them. She laughs as the dog licks her cheek, sunset casting warm orange light. Slow pan shot from left to right, shallow focus on their interaction. Ambient audio: wind rustling grass, dog’s happy panting. Dialogue: ‘You’re the best friend ever, Buddy!

Related Models

veo3.1-fast/reference-to-video

image-to-video

nano-banana-pro/edit

image-to-image

nano-banana-2/edit

image-to-image

nano-banana-pro/edit-ultra

image-to-image

nano-banana-2/edit-fast

image-to-image

veo3.1/image-to-video

image-to-video

README

Google Veo 3 — Text-to-Video AI Generator

Veo 3 is Google DeepMind’s next-generation text-to-video model, capable of producing cinematic, high-fidelity videos directly from natural-language prompts. With native audio generation, dialogue lip-sync, and deep physical reasoning, Veo 3 redefines what’s possible in multimodal AI video creation.

🌟 Why it stands out

Text → Image → Video pipeline Generate stunning visuals and extend them into smooth, cinematic video sequences.
Native Audio Generation Automatically adds ambient sound, effects, and dialogue synchronized perfectly with visuals—no post-production required.
Dialogue & Lip-Sync Characters can speak your script with accurate lip synchronization, enabling AI filmmaking and animation storytelling.
Physics-Aware Motion & Spatial Understanding Veo 3 understands depth, space, and motion—ideal for dynamic scenes, game environments, and realistic interactions.
High Prompt Accuracy Enhanced natural-language understanding ensures semantic alignment and context-aware video generation.
Cinematic Lighting & Quality Delivers professional-grade output with authentic lighting, depth of field, and motion consistency.

🧠 Built by Google DeepMind

Developed by Google DeepMind’s world-class research team, Veo 3 empowers creators, developers, and studios to push the limits of AI-driven storytelling and visual production.

✍️ Prompting Tips

Use clear, cinematic descriptions for best results:

Shot Composition: close-up, two-shot, over-the-shoulder
Lens & Focus: macro lens, shallow focus, wide-angle lens
Genre & Style: sci-fi, romantic comedy, action movie
Camera Motion: zoom shot, dolly shot, tracking shot, pan shot

🎬 Example Prompt

Close-up shot of melting icicles on a frozen rock wall with cool blue tones, zoomed in to capture the dripping water detail in cinematic lighting and shallow focus.

⚙️ Technical Overview

Property	Description
Type	Text-to-Video (with Audio)
Resolution	Up to 1080p
Max Duration	8 seconds
Output Format	MP4 + Stereo Audio
Audio	Native ambient, dialogue, SFX, and music

💰 Pricing

Every run needs $3.2 (both 720p and 1080p)

Without audio needs $1.2

✅ Commercial use allowed

🚀 How to Use

Write Your Prompt Describe the scene you want to create — include subjects, actions, lighting, camera movement, and mood.

Example: “A close-up of a young woman standing in the rain, soft cinematic lighting, slow tracking shot.”

Add Optional Elements

Dialogue → Use quotation marks " " for spoken lines.
Reference Image → Upload one or more images to keep visual consistency across clips.
Camera Direction → Add terms like zoom in, pan right, tracking shot for cinematic movement.

Choose Video Settings Select the duration (up to 8 seconds) and resolution (up to 1080p).
Generate the Video Submit your prompt — Veo 3 will automatically generate both video and native audio (dialogue, ambient sounds, music).
Preview & Download Review the clip, make prompt refinements if needed, then download the final MP4 file.

💡 Tip: For best results, keep each prompt focused on a single scene or emotional moment. Avoid mixing multiple time periods or locations in one request.

📝 Notes

Optimized for short-form storytelling, advertising, and creative video experiments.
Audio is generated natively and currently supports only stereo output.
For best clarity, describe the main subject, scene, and lighting precisely.
Make sure your prompts follow Google’s Safety Guidelines — if an error appears, revise your prompt and try again.

Accessibility:This website uses AI models provided by third parties.

ExamplesView all

Related Models

README

Google Veo 3 — Text-to-Video AI Generator

🌟 Why it stands out

🧠 Built by Google DeepMind

✍️ Prompting Tips

🎬 Example Prompt

⚙️ Technical Overview

💰 Pricing

🚀 How to Use

📝 Notes

Veo3 API — Quick start

Veo3 API — Frequently asked questions