Google Veo3

Playground

Google Veo3 is Google’s flagship text-to-video model with built-in audio, producing synchronized video and sound from text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Google Veo 3 — Text-to-Video AI Generator

Veo 3 is Google DeepMind’s next-generation text-to-video model, capable of producing cinematic, high-fidelity videos directly from natural-language prompts. With native audio generation, dialogue lip-sync, and deep physical reasoning, Veo 3 redefines what’s possible in multimodal AI video creation.

🌟 Why it stands out

Text → Image → Video pipeline Generate stunning visuals and extend them into smooth, cinematic video sequences.
Native Audio Generation Automatically adds ambient sound, effects, and dialogue synchronized perfectly with visuals—no post-production required.
Dialogue & Lip-Sync Characters can speak your script with accurate lip synchronization, enabling AI filmmaking and animation storytelling.
Physics-Aware Motion & Spatial Understanding Veo 3 understands depth, space, and motion—ideal for dynamic scenes, game environments, and realistic interactions.
High Prompt Accuracy Enhanced natural-language understanding ensures semantic alignment and context-aware video generation.
Cinematic Lighting & Quality Delivers professional-grade output with authentic lighting, depth of field, and motion consistency.

🧠 Built by Google DeepMind

Developed by Google DeepMind’s world-class research team, Veo 3 empowers creators, developers, and studios to push the limits of AI-driven storytelling and visual production.

✍️ Prompting Tips

Use clear, cinematic descriptions for best results:

Shot Composition: close-up, two-shot, over-the-shoulder
Lens & Focus: macro lens, shallow focus, wide-angle lens
Genre & Style: sci-fi, romantic comedy, action movie
Camera Motion: zoom shot, dolly shot, tracking shot, pan shot

🎬 Example Prompt

Close-up shot of melting icicles on a frozen rock wall with cool blue tones, zoomed in to capture the dripping water detail in cinematic lighting and shallow focus.

⚙️ Technical Overview

Property	Description
Type	Text-to-Video (with Audio)
Resolution	Up to 1080p
Max Duration	8 seconds
Output Format	MP4 + Stereo Audio
Audio	Native ambient, dialogue, SFX, and music

💰 Pricing

Every run needs $3.2 (both 720p and 1080p)

Without audio needs $1.2

✅ Commercial use allowed

🚀 How to Use

Write Your Prompt Describe the scene you want to create — include subjects, actions, lighting, camera movement, and mood.

Example: “A close-up of a young woman standing in the rain, soft cinematic lighting, slow tracking shot.”
Add Optional Elements
- Dialogue → Use quotation marks ” ” for spoken lines.
- Reference Image → Upload one or more images to keep visual consistency across clips.
- Camera Direction → Add terms like zoom in, pan right, tracking shot for cinematic movement.
Choose Video Settings Select the duration (up to 8 seconds) and resolution (up to 1080p).
Generate the Video Submit your prompt — Veo 3 will automatically generate both video and native audio (dialogue, ambient sounds, music).
Preview & Download Review the clip, make prompt refinements if needed, then download the final MP4 file.

💡 Tip: For best results, keep each prompt focused on a single scene or emotional moment. Avoid mixing multiple time periods or locations in one request.

📝 Notes

Optimized for short-form storytelling, advertising, and creative video experiments.
Audio is generated natively and currently supports only stereo output.
For best clarity, describe the main subject, scene, and lighting precisely.
Make sure your prompts follow Google’s Safety Guidelines — if an error appears, revise your prompt and try again.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/google/veo3" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "aspect_ratio": "16:9",
    "duration": 8,
    "resolution": "720p",
    "generate_audio": true
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	Text prompt for generation; Positive text prompt.
aspect_ratio	string	No	16:9	16:9, 9:16	Aspect ratio of the video.
duration	integer	No	8	8, 4, 6	The duration of the generated media in seconds.
resolution	string	No	720p	720p, 1080p	Video resolution.
generate_audio	boolean	No	true	-	Whether to generate audio.
negative_prompt	string	No		-	Negative prompt for the generation.
seed	integer	No	-	-1 ~ 2147483647	The random seed to use for the generation.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Google Veo2 Image To Video Google Veo3 Fast