Google Veo3.1 Text To Video
Playground
Try it on WavespeedAI!Google Veo 3.1 introduces native 1080p resolution, delivering enhanced quality and flexibility for creators.
Features
🎥 Google Veo 3.1 — Text-to-Video (T2V) Model
Veo 3.1 T2V is the latest text-to-video model from Google DeepMind, designed to bring cinematic storytelling to life through text. It generates high-fidelity 1080p videos with synchronized, context-aware audio, realistic motion, and narrative consistency — making it one of the most advanced generative video systems ever released.
🌟 Why it stands out
-
🎬 Cinematic Realism Produces natural lighting, smooth camera transitions, and accurate perspective for film-like motion.
-
🔊 Native Audio Generation Generates synchronized ambient sound, dialogue, and music directly aligned with the visuals.
-
🗣️ Dialogue & Lip-Sync Supports speaking characters and realistic facial expressions — perfect for storytelling, marketing, or short-form content.
-
🧠 Subject Consistency (R2V) Maintains a character’s or object’s identity across frames using 1–3 reference images.
-
🎞️ Video Interpolation Seamlessly animates transitions between two given frames — ideal for smooth start-to-end storytelling.
-
📐 Flexible Output Supports both 720p and 1080p, at 24 FPS, duration for 4s, 6s, 8s, and in both 16:9 (landscape) and 9:16 (portrait) formats.
⚙️ Key Parameters
- prompt — Describe your scene or story (e.g., “A drone shot flying over Las Vegas, transitioning from day to night with soft jazz in the background”).
- durationSeconds — Choose video length (4s, 6s, or 8s).
- resolution — 720p or 1080p.
- aspectRatio — Landscape (16:9) or Portrait (9:16).
💰 Pricing (Preview Stage)
Model | Description | Input Type | Output | Price |
---|---|---|---|---|
Veo 3.1 (Video + Audio) | Generate videos with synchronized sound | Text / Image | Video + Audio | $0.40 / sec |
Veo 3.1 (Video only) | Generate high-quality silent videos | Text / Image | Video | $0.20 / sec |
💡 Minimum cost: ~$3.20 per clip (based on 8s @ 1080p).
🚀 How to Use
-
✍️ Write a Prompt Describe the desired motion, camera style, lighting, and sound.
Example: “A cinematic sunset over the ocean, waves glimmering as seagulls fly across the horizon.”
-
⚙️ Adjust Parameters Select duration, resolution (720p/1080p), and aspect ratio.
-
▶️ Generate Submit your request — Veo 3.1 will render motion, lighting, and synchronized audio.
-
💾 Preview & Download Review your video, refine your prompt if needed, then download the final MP4.
💡 Pro Tips
- Keep prompts focused on one main action or subject for better coherence.
- Use camera verbs like “tracking,” “zoom out,” or “handheld” for cinematic control.
- Mention lighting and mood cues (e.g., “under soft moonlight,” “golden-hour glow”).
- Use R2V for character-based storytelling; Interpolation for smooth transitions.
- Avoid conflicting instructions (e.g., “fast zoom” and “slow motion” together).
🧾 Notes & Limitations
- Generation time: ~2–3 minutes for an 8-second 1080p clip.
- Frame rate fixed at 24 FPS.
- Advanced controls (R2V, I2V, Interpolation) are mutually exclusive — only one per generation.
- If your prompt is blocked, rewrite it and resubmit (safety thresholds may adjust during preview).
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "1080p",
"generate_audio": false
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
Parameter | Type | Required | Default | Range | Description |
---|---|---|---|---|---|
prompt | string | Yes | - | Text prompt for generation; Positive text prompt. | |
aspect_ratio | string | No | 16:9 | 16:9, 9:16 | Aspect ratio of the video. |
duration | integer | No | 8 | 8, 4, 6 | The duration of the generated media in seconds. |
resolution | string | No | 1080p | 720p, 1080p | Video resolution. |
generate_audio | boolean | No | false | - | Whether to generate audio. |
negative_prompt | string | No | - | Negative prompt for the generation. | |
seed | integer | No | - | -1 ~ 2147483647 | The random seed to use for the generation. |
Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data.id | string | Unique identifier for the prediction, Task Id |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |