Google Veo3 is Google's flagship text-to-video model with built-in audio, producing synchronized video and sound from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Idle
$3.2per run
A breaking news ident, followed by a TV news presenter excitedly telling us: We interrupt this programme to bring you some breaking news... Veo 3 is now live on WaveSpeedAI. Then she shouts: Let's go! The TV presenter is an epic and cool punk with pink and green hair and a t-shirt that says 'Veo 3 on WaveSpeedAI'
A lone astronaut in a dark silver suit walks through a narrow corridor inside a damaged space station. Sparks fall from the ceiling, flickering lights cast dynamic shadows on the metallic walls. The camera slowly dollies forward, then cuts to a side close-up as the astronaut turns their head, revealing a cracked helmet visor. A subtle glow from Earth is visible through a distant window.
A man walking alone on a forest trail in autumn, leaves crunching underfoot. Birds chirp in the distance. He murmurs: "Feels good to be away from everything for a while."
Two best friends lying in a grassy field under the stars, pointing at constellations. One says: "Do you ever think about how big the universe is?" The other replies: "All the time... and it still amazes me."
A detective and a witness talking in a quiet diner at night. Rain falls outside. The detective asks: "What did you see that night?" The witness hesitates: "I saw someone... but not their face."
A teenage girl sketching in a notebook at a café window, rain tapping on the glass. Lo-fi music plays faintly. She whispers: "I just want to draw forever..."
Two lovers sitting at a riverside bench, sun setting behind them. She asks: "Would you still love me if we never met again?" He responds: "I'd find you in every lifetime."
A table covered with colorful fruits. A slow, precise hand picks up a kiwi, peels it with a small knife. Subtle slicing sounds. Whispered voice: "Today we're doing kiwi and starfruit... hear that soft skin peeling?"
An alchemist in a glowing lab pouring colored liquids into flasks, small pops and fizzing sounds. He mutters: "A drop of dragonroot... two of nightshade... let's see what happens..."
A close-up of hands gently slicing a ripe mango on a bamboo cutting board. The knife glides through the soft flesh with a wet, sticky sound. Juice slowly pools on the surface. The room is silent except for the smooth, squishy slice.
A chilled green apple being sliced into paper-thin wedges. Each cut makes a sharp, clean snap against the wooden board. Ambient sound of a quiet room, no other noise.
A wide-angle shot of a cyberpunk city at dusk, neon pink and cyan holographic billboards glitching, hovercars zipping through smog with magenta exhaust trails. A rogue hacker with a cybernetic arm taps a holographic keyboard, rain dripping off their hood. Cinematic lighting, shallow focus on the hacker’s glowing arm, dolly shot moving forward. Ambient audio: synthwave music, hovercar hums, rain patter. Dialogue: ‘We’re close to breaking the mainframe…
A two-shot of a young girl and her fluffy golden retriever sitting on a blanket in a meadow, wildflowers blooming around them. She laughs as the dog licks her cheek, sunset casting warm orange light. Slow pan shot from left to right, shallow focus on their interaction. Ambient audio: wind rustling grass, dog’s happy panting. Dialogue: ‘You’re the best friend ever, Buddy!
Veo 3 is Google DeepMind’s next-generation text-to-video model, capable of producing cinematic, high-fidelity videos directly from natural-language prompts. With native audio generation, dialogue lip-sync, and deep physical reasoning, Veo 3 redefines what’s possible in multimodal AI video creation.
Text → Image → Video pipeline Generate stunning visuals and extend them into smooth, cinematic video sequences.
Native Audio Generation Automatically adds ambient sound, effects, and dialogue synchronized perfectly with visuals—no post-production required.
Dialogue & Lip-Sync Characters can speak your script with accurate lip synchronization, enabling AI filmmaking and animation storytelling.
Physics-Aware Motion & Spatial Understanding Veo 3 understands depth, space, and motion—ideal for dynamic scenes, game environments, and realistic interactions.
High Prompt Accuracy Enhanced natural-language understanding ensures semantic alignment and context-aware video generation.
Cinematic Lighting & Quality Delivers professional-grade output with authentic lighting, depth of field, and motion consistency.
Developed by Google DeepMind’s world-class research team, Veo 3 empowers creators, developers, and studios to push the limits of AI-driven storytelling and visual production.
Use clear, cinematic descriptions for best results:
close-up, two-shot, over-the-shouldermacro lens, shallow focus, wide-angle lenssci-fi, romantic comedy, action moviezoom shot, dolly shot, tracking shot, pan shotClose-up shot of melting icicles on a frozen rock wall with cool blue tones, zoomed in to capture the dripping water detail in cinematic lighting and shallow focus.
| Property | Description |
|---|---|
| Type | Text-to-Video (with Audio) |
| Resolution | Up to 1080p |
| Max Duration | 8 seconds |
| Output Format | MP4 + Stereo Audio |
| Audio | Native ambient, dialogue, SFX, and music |
Every run needs $3.2 (both 720p and 1080p)
Without audio needs $1.2
✅ Commercial use allowed
Example: “A close-up of a young woman standing in the rain, soft cinematic lighting, slow tracking shot.”
Choose Video Settings Select the duration (up to 8 seconds) and resolution (up to 1080p).
Generate the Video Submit your prompt — Veo 3 will automatically generate both video and native audio (dialogue, ambient sounds, music).
Preview & Download Review the clip, make prompt refinements if needed, then download the final MP4 file.
💡 Tip: For best results, keep each prompt focused on a single scene or emotional moment. Avoid mixing multiple time periods or locations in one request.
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/google/veo3 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Veo3 below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/google/veo3" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "720p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("google/veo3", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "720p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"google/veo3",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "720p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
}
)
print(output["outputs"][0]) # → URL of the generated outputVeo3 is a Google model for video generation, exposed as a REST API on WaveSpeedAI. Google Veo3 is Google's flagship text-to-video model with built-in audio, producing synchronized video and sound from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/google/google-veo3.
Veo3 starts at $3.20 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `aspect_ratio`, `resolution`, `duration`, `seed`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/google/google-veo3.
Average end-to-end generation time on WaveSpeedAI is around 130 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Google). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.