WaveSpeedAI APIGoogleGoogle Veo3.1 Text To Video

Google Veo3.1 Text To Video

Google Veo3.1 Text To Video

Playground

Try it on WavespeedAI!

Google Veo 3.1 introduces native 1080p resolution, delivering enhanced quality and flexibility for creators.

Features

🎥 Google Veo 3.1 — Text-to-Video (T2V) Model

Veo 3.1 T2V is the latest text-to-video model from Google DeepMind, designed to bring cinematic storytelling to life through text. It generates high-fidelity 1080p videos with synchronized, context-aware audio, realistic motion, and narrative consistency — making it one of the most advanced generative video systems ever released.


🌟 Why it stands out

  • 🎬 Cinematic Realism Produces natural lighting, smooth camera transitions, and accurate perspective for film-like motion.

  • 🔊 Native Audio Generation Generates synchronized ambient sound, dialogue, and music directly aligned with the visuals.

  • 🗣️ Dialogue & Lip-Sync Supports speaking characters and realistic facial expressions — perfect for storytelling, marketing, or short-form content.

  • 🧠 Subject Consistency (R2V) Maintains a character’s or object’s identity across frames using 1–3 reference images.

  • 🎞️ Video Interpolation Seamlessly animates transitions between two given frames — ideal for smooth start-to-end storytelling.

  • 📐 Flexible Output Supports both 720p and 1080p, at 24 FPS, duration for 4s, 6s, 8s, and in both 16:9 (landscape) and 9:16 (portrait) formats.


⚙️ Key Parameters

  • prompt — Describe your scene or story (e.g., “A drone shot flying over Las Vegas, transitioning from day to night with soft jazz in the background”).
  • durationSeconds — Choose video length (4s, 6s, or 8s).
  • resolution — 720p or 1080p.
  • aspectRatio — Landscape (16:9) or Portrait (9:16).

💰 Pricing (Preview Stage)

ModelDescriptionInput TypeOutputPrice
Veo 3.1 (Video + Audio)Generate videos with synchronized soundText / ImageVideo + Audio$0.40 / sec
Veo 3.1 (Video only)Generate high-quality silent videosText / ImageVideo$0.20 / sec

💡 Minimum cost: ~$3.20 per clip (based on 8s @ 1080p).


🚀 How to Use

  1. ✍️ Write a Prompt Describe the desired motion, camera style, lighting, and sound.

    Example: “A cinematic sunset over the ocean, waves glimmering as seagulls fly across the horizon.”

  2. ⚙️ Adjust Parameters Select duration, resolution (720p/1080p), and aspect ratio.

  3. ▶️ Generate Submit your request — Veo 3.1 will render motion, lighting, and synchronized audio.

  4. 💾 Preview & Download Review your video, refine your prompt if needed, then download the final MP4.


💡 Pro Tips

  • Keep prompts focused on one main action or subject for better coherence.
  • Use camera verbs like “tracking,” “zoom out,” or “handheld” for cinematic control.
  • Mention lighting and mood cues (e.g., “under soft moonlight,” “golden-hour glow”).
  • Use R2V for character-based storytelling; Interpolation for smooth transitions.
  • Avoid conflicting instructions (e.g., “fast zoom” and “slow motion” together).

🧾 Notes & Limitations

  • Generation time: ~2–3 minutes for an 8-second 1080p clip.
  • Frame rate fixed at 24 FPS.
  • Advanced controls (R2V, I2V, Interpolation) are mutually exclusive — only one per generation.
  • If your prompt is blocked, rewrite it and resubmit (safety thresholds may adjust during preview).

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "aspect_ratio": "16:9",
    "duration": 8,
    "resolution": "1080p",
    "generate_audio": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
promptstringYes-Text prompt for generation; Positive text prompt.
aspect_ratiostringNo16:916:9, 9:16Aspect ratio of the video.
durationintegerNo88, 4, 6The duration of the generated media in seconds.
resolutionstringNo1080p720p, 1080pVideo resolution.
generate_audiobooleanNofalse-Whether to generate audio.
negative_promptstringNo-Negative prompt for the generation.
seedintegerNo--1 ~ 2147483647The random seed to use for the generation.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

© 2025 WaveSpeedAI. All rights reserved.