Giảm 50% mô hình Vidu Q3 & Q3 Pro · Chỉ trên WaveSpeedAI | 20/5 – 2/6

Veo3

google /

Google Veo3 is Google's flagship text-to-video model with built-in audio, producing synchronized video and sound from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video
Input
Whether to generate audio.

Idle

$3.2per run

Next:

ExamplesView all

A breaking news ident, followed by a TV news presenter excitedly telling us: We interrupt this programme to bring you some breaking news... Veo 3 is now live on WaveSpeedAI. Then she shouts: Let's go! The TV presenter is an epic and cool punk with pink and green hair and a t-shirt that says 'Veo 3 on WaveSpeedAI'

A lone astronaut in a dark silver suit walks through a narrow corridor inside a damaged space station. Sparks fall from the ceiling, flickering lights cast dynamic shadows on the metallic walls. The camera slowly dollies forward, then cuts to a side close-up as the astronaut turns their head, revealing a cracked helmet visor. A subtle glow from Earth is visible through a distant window.

A man walking alone on a forest trail in autumn, leaves crunching underfoot. Birds chirp in the distance. He murmurs: "Feels good to be away from everything for a while."

Two best friends lying in a grassy field under the stars, pointing at constellations. One says: "Do you ever think about how big the universe is?" The other replies: "All the time... and it still amazes me."

A detective and a witness talking in a quiet diner at night. Rain falls outside. The detective asks: "What did you see that night?" The witness hesitates: "I saw someone... but not their face."

A teenage girl sketching in a notebook at a café window, rain tapping on the glass. Lo-fi music plays faintly. She whispers: "I just want to draw forever..."

Two lovers sitting at a riverside bench, sun setting behind them. She asks: "Would you still love me if we never met again?" He responds: "I'd find you in every lifetime."

A table covered with colorful fruits. A slow, precise hand picks up a kiwi, peels it with a small knife. Subtle slicing sounds. Whispered voice: "Today we're doing kiwi and starfruit... hear that soft skin peeling?"

An alchemist in a glowing lab pouring colored liquids into flasks, small pops and fizzing sounds. He mutters: "A drop of dragonroot... two of nightshade... let's see what happens..."

A close-up of hands gently slicing a ripe mango on a bamboo cutting board. The knife glides through the soft flesh with a wet, sticky sound. Juice slowly pools on the surface. The room is silent except for the smooth, squishy slice.

A chilled green apple being sliced into paper-thin wedges. Each cut makes a sharp, clean snap against the wooden board. Ambient sound of a quiet room, no other noise.

A wide-angle shot of a cyberpunk city at dusk, neon pink and cyan holographic billboards glitching, hovercars zipping through smog with magenta exhaust trails. A rogue hacker with a cybernetic arm taps a holographic keyboard, rain dripping off their hood. Cinematic lighting, shallow focus on the hacker’s glowing arm, dolly shot moving forward. Ambient audio: synthwave music, hovercar hums, rain patter. Dialogue: ‘We’re close to breaking the mainframe…

A two-shot of a young girl and her fluffy golden retriever sitting on a blanket in a meadow, wildflowers blooming around them. She laughs as the dog licks her cheek, sunset casting warm orange light. Slow pan shot from left to right, shallow focus on their interaction. Ambient audio: wind rustling grass, dog’s happy panting. Dialogue: ‘You’re the best friend ever, Buddy!

Related Models

README

Google Veo 3 — Text-to-Video AI Generator

Veo 3 is Google DeepMind’s next-generation text-to-video model, capable of producing cinematic, high-fidelity videos directly from natural-language prompts. With native audio generation, dialogue lip-sync, and deep physical reasoning, Veo 3 redefines what’s possible in multimodal AI video creation.

🌟 Why it stands out

  • Text → Image → Video pipeline Generate stunning visuals and extend them into smooth, cinematic video sequences.

  • Native Audio Generation Automatically adds ambient sound, effects, and dialogue synchronized perfectly with visuals—no post-production required.

  • Dialogue & Lip-Sync Characters can speak your script with accurate lip synchronization, enabling AI filmmaking and animation storytelling.

  • Physics-Aware Motion & Spatial Understanding Veo 3 understands depth, space, and motion—ideal for dynamic scenes, game environments, and realistic interactions.

  • High Prompt Accuracy Enhanced natural-language understanding ensures semantic alignment and context-aware video generation.

  • Cinematic Lighting & Quality Delivers professional-grade output with authentic lighting, depth of field, and motion consistency.

🧠 Built by Google DeepMind

Developed by Google DeepMind’s world-class research team, Veo 3 empowers creators, developers, and studios to push the limits of AI-driven storytelling and visual production.

✍️ Prompting Tips

Use clear, cinematic descriptions for best results:

  • Shot Composition: close-up, two-shot, over-the-shoulder
  • Lens & Focus: macro lens, shallow focus, wide-angle lens
  • Genre & Style: sci-fi, romantic comedy, action movie
  • Camera Motion: zoom shot, dolly shot, tracking shot, pan shot

🎬 Example Prompt

Close-up shot of melting icicles on a frozen rock wall with cool blue tones, zoomed in to capture the dripping water detail in cinematic lighting and shallow focus.

⚙️ Technical Overview

PropertyDescription
TypeText-to-Video (with Audio)
ResolutionUp to 1080p
Max Duration8 seconds
Output FormatMP4 + Stereo Audio
AudioNative ambient, dialogue, SFX, and music

💰 Pricing

Every run needs $3.2 (both 720p and 1080p)

Without audio needs $1.2

✅ Commercial use allowed

🚀 How to Use

  1. Write Your Prompt Describe the scene you want to create — include subjects, actions, lighting, camera movement, and mood.

Example: “A close-up of a young woman standing in the rain, soft cinematic lighting, slow tracking shot.”

  1. Add Optional Elements
  • Dialogue → Use quotation marks " " for spoken lines.
  • Reference Image → Upload one or more images to keep visual consistency across clips.
  • Camera Direction → Add terms like zoom in, pan right, tracking shot for cinematic movement.
  1. Choose Video Settings Select the duration (up to 8 seconds) and resolution (up to 1080p).

  2. Generate the Video Submit your prompt — Veo 3 will automatically generate both video and native audio (dialogue, ambient sounds, music).

  3. Preview & Download Review the clip, make prompt refinements if needed, then download the final MP4 file.

💡 Tip: For best results, keep each prompt focused on a single scene or emotional moment. Avoid mixing multiple time periods or locations in one request.

📝 Notes

  • Optimized for short-form storytelling, advertising, and creative video experiments.
  • Audio is generated natively and currently supports only stereo output.
  • For best clarity, describe the main subject, scene, and lighting precisely.
  • Make sure your prompts follow Google’s Safety Guidelines — if an error appears, revise your prompt and try again.
Accessibility:This website uses AI models provided by third parties.

Veo3 API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/google/veo3 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Veo3 below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/google/veo3" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "aspect_ratio": "16:9",
    "duration": 8,
    "resolution": "720p",
    "generate_audio": true,
    "negative_prompt": "blurry, low quality, distorted",
    "seed": 0
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("google/veo3", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "aspect_ratio": "16:9",
        "duration": 8,
        "resolution": "720p",
        "generate_audio": true,
        "negative_prompt": "blurry, low quality, distorted",
        "seed": 0
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "google/veo3",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "aspect_ratio": "16:9",
    "duration": 8,
    "resolution": "720p",
    "generate_audio": true,
    "negative_prompt": "blurry, low quality, distorted",
    "seed": 0
}
)

print(output["outputs"][0])  # → URL of the generated output

Veo3 API — Frequently asked questions

What is the Veo3 API?

Veo3 is a Google model for video generation, exposed as a REST API on WaveSpeedAI. Google Veo3 is Google's flagship text-to-video model with built-in audio, producing synchronized video and sound from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Veo3 API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/google/google-veo3.

How much does Veo3 cost per run?

Veo3 starts at $3.20 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Veo3 accept?

Key inputs: `prompt`, `aspect_ratio`, `resolution`, `duration`, `seed`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/google/google-veo3.

How long does Veo3 take to generate?

Average end-to-end generation time on WaveSpeedAI is around 130 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Veo3 outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Google). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.