Google Veo3

Google Veo3

Playground

Try it on WavespeedAI!

Google Veo3 is Google’s flagship text-to-video model with built-in audio, producing synchronized video and sound from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Google Veo 3 — Text-to-Video AI Generator

Veo 3 is Google DeepMind’s next-generation text-to-video model, capable of producing cinematic, high-fidelity videos directly from natural-language prompts. With native audio generation, dialogue lip-sync, and deep physical reasoning, Veo 3 redefines what’s possible in multimodal AI video creation.


🌟 Why it stands out

  • Text → Image → Video pipeline Generate stunning visuals and extend them into smooth, cinematic video sequences.

  • Native Audio Generation Automatically adds ambient sound, effects, and dialogue synchronized perfectly with visuals—no post-production required.

  • Dialogue & Lip-Sync Characters can speak your script with accurate lip synchronization, enabling AI filmmaking and animation storytelling.

  • Physics-Aware Motion & Spatial Understanding Veo 3 understands depth, space, and motion—ideal for dynamic scenes, game environments, and realistic interactions.

  • High Prompt Accuracy Enhanced natural-language understanding ensures semantic alignment and context-aware video generation.

  • Cinematic Lighting & Quality Delivers professional-grade output with authentic lighting, depth of field, and motion consistency.


🧠 Built by Google DeepMind

Developed by Google DeepMind’s world-class research team, Veo 3 empowers creators, developers, and studios to push the limits of AI-driven storytelling and visual production.


✍️ Prompting Tips

Use clear, cinematic descriptions for best results:

  • Shot Composition: close-up, two-shot, over-the-shoulder
  • Lens & Focus: macro lens, shallow focus, wide-angle lens
  • Genre & Style: sci-fi, romantic comedy, action movie
  • Camera Motion: zoom shot, dolly shot, tracking shot, pan shot

🎬 Example Prompt

Close-up shot of melting icicles on a frozen rock wall with cool blue tones, zoomed in to capture the dripping water detail in cinematic lighting and shallow focus.


⚙️ Technical Overview

PropertyDescription
TypeText-to-Video (with Audio)
ResolutionUp to 1080p
Max Duration8 seconds
Output FormatMP4 + Stereo Audio
AudioNative ambient, dialogue, SFX, and music

💰 Pricing

Every run needs $3.2 (both 720p and 1080p)

Without audio needs $1.2

✅ Commercial use allowed


🚀 How to Use

  1. Write Your Prompt Describe the scene you want to create — include subjects, actions, lighting, camera movement, and mood.

    Example: “A close-up of a young woman standing in the rain, soft cinematic lighting, slow tracking shot.”

  2. Add Optional Elements

    • Dialogue → Use quotation marks ” ” for spoken lines.
    • Reference Image → Upload one or more images to keep visual consistency across clips.
    • Camera Direction → Add terms like zoom in, pan right, tracking shot for cinematic movement.
  3. Choose Video Settings Select the duration (up to 8 seconds) and resolution (up to 1080p).

  4. Generate the Video Submit your prompt — Veo 3 will automatically generate both video and native audio (dialogue, ambient sounds, music).

  5. Preview & Download Review the clip, make prompt refinements if needed, then download the final MP4 file.


💡 Tip: For best results, keep each prompt focused on a single scene or emotional moment. Avoid mixing multiple time periods or locations in one request.


📝 Notes

  • Optimized for short-form storytelling, advertising, and creative video experiments.
  • Audio is generated natively and currently supports only stereo output.
  • For best clarity, describe the main subject, scene, and lighting precisely.
  • Make sure your prompts follow Google’s Safety Guidelines — if an error appears, revise your prompt and try again.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/google/veo3" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "aspect_ratio": "16:9",
    "duration": 8,
    "resolution": "720p",
    "generate_audio": true
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
promptstringYes-Text prompt for generation; Positive text prompt.
aspect_ratiostringNo16:916:9, 9:16Aspect ratio of the video.
durationintegerNo88, 4, 6The duration of the generated media in seconds.
resolutionstringNo720p720p, 1080pVideo resolution.
generate_audiobooleanNotrue-Whether to generate audio.
negative_promptstringNo-Negative prompt for the generation.
seedintegerNo--1 ~ 2147483647The random seed to use for the generation.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

© 2025 WaveSpeedAI. All rights reserved.