Google Veo3.1 Text To Video

Playground

Google Veo 3.1 converts text prompts into videos with synchronized audio at native 1080p for high-quality outputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Veo 3.1 T2V is the latest text-to-video model from Google DeepMind, designed to bring cinematic storytelling to life through text. It generates high-fidelity 1080p videos with synchronized, context-aware audio, realistic motion, and narrative consistency — making it one of the most advanced generative video systems ever released.

🌟 Why it stands out

🎬 Cinematic Realism Produces natural lighting, smooth camera transitions, and accurate perspective for film-like motion.
🔊 Native Audio Generation Generates synchronized ambient sound, dialogue, and music directly aligned with the visuals.
🗣️ Dialogue & Lip-Sync Supports speaking characters and realistic facial expressions — perfect for storytelling, marketing, or short-form content.
🧠 Subject Consistency (R2V) Maintains a character’s or object’s identity across frames using 1–3 reference images.
🎞️ Video Interpolation Seamlessly animates transitions between two given frames — ideal for smooth start-to-end storytelling.
📐 Flexible Output Supports both 720p and 1080p, at 24 FPS, duration for 4s, 6s, 8s, and in both 16:9 (landscape) and 9:16 (portrait) formats.

⚙️ Key Parameters

prompt — Describe your scene or story (e.g., “A drone shot flying over Las Vegas, transitioning from day to night with soft jazz in the background”).
durationSeconds — Choose video length (4s, 6s, or 8s).
resolution — 720p or 1080p.
aspectRatio — Landscape (16:9) or Portrait (9:16).

💰 Pricing (Preview Stage)

Model	Description	Input Type	Output	Price
Veo 3.1 (Video + Audio)	Generate videos with synchronized sound	Text / Image	Video + Audio	$0.40 / sec
Veo 3.1 (Video only)	Generate high-quality silent videos	Text / Image	Video	$0.20 / sec

💡 Minimum cost: ~$3.20 per clip (based on 8s @ 1080p). Without audio needs $1.60.

🚀 How to Use

✍️ Write a Prompt Describe the desired motion, camera style, lighting, and sound.

> Example: “A cinematic sunset over the ocean, waves glimmering as seagulls fly across the horizon.”

⚙️ Adjust Parameters Select duration, resolution (720p/1080p), and aspect ratio.
▶️ Generate Submit your request — Veo 3.1 will render motion, lighting, and synchronized audio.
💾 Preview & Download Review your video, refine your prompt if needed, then download the final MP4.

💡 Pro Tips

Keep prompts focused on one main action or subject for better coherence.
Use camera verbs like “tracking,” “zoom out,” or “handheld” for cinematic control.
Mention lighting and mood cues (e.g., “under soft moonlight,” “golden-hour glow”).
Use R2V for character-based storytelling; Interpolation for smooth transitions.
Avoid conflicting instructions (e.g., “fast zoom” and “slow motion” together).

🧾 Notes & Limitations

Generation time: ~2–3 minutes for an 8-second 1080p clip.
Frame rate fixed at 24 FPS.
Advanced controls (R2V, I2V, Interpolation) are mutually exclusive — only one per generation.
If your prompt is blocked, rewrite it and resubmit (safety thresholds may adjust during preview).

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result

set -euo pipefail

export WAVESPEED_API_KEY="your-api-key"

REQUEST_BODY=$(cat <<'JSON'
{
  "prompt": "A cinematic ocean wave at sunrise, highly detailed",
  "aspect_ratio": "16:9",
  "duration": 8,
  "resolution": "1080p",
  "generate_audio": true
}
JSON
)

# 1. Submit the prediction.
SUBMIT_RESPONSE=$(curl --silent --show-error --fail-with-body \
  -X POST "https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video" \
  -H "Authorization: Bearer ${WAVESPEED_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "${REQUEST_BODY}")

TASK=$(printf '%s' "${SUBMIT_RESPONSE}" | jq 'if type == "object" and has("data") then .data else . end')
PREDICTION_ID=$(printf '%s' "${TASK}" | jq -r '.id // empty')
if [ -z "${PREDICTION_ID}" ]; then
  printf 'Submission response did not contain a prediction id
' >&2
  exit 1
fi
RESULT_URL=$(printf '%s' "${TASK}" | jq -r '.urls.get // empty')
if [ -z "${RESULT_URL}" ]; then RESULT_URL="https://api.wavespeed.ai/api/v3/predictions/${PREDICTION_ID}/result"; fi

# 2. Poll until the prediction finishes.
while true; do
  RESPONSE=$(curl --silent --show-error --fail-with-body \
    "${RESULT_URL}" \
    -H "Authorization: Bearer ${WAVESPEED_API_KEY}")
  RESULT=$(printf '%s' "${RESPONSE}" | jq 'if type == "object" and has("data") then .data else . end')
  STATUS=$(printf '%s' "${RESULT}" | jq -r '.status // empty')

  case "${STATUS}" in
    completed) printf '%s\n' "${RESULT}" | jq '.outputs'; break ;;
    failed|cancelled|timeout) printf '%s\n' "${RESULT}" | jq . >&2; exit 1 ;;
    created|processing) sleep 2 ;;
    *) printf 'Unexpected status: %s
' "${STATUS}" >&2; exit 1 ;;
  esac
done

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	Text prompt for generation; Positive text prompt.
aspect_ratio	string	No	16:9	16:9, 9:16	Aspect ratio of the video.
duration	integer	No	8	8, 4, 6	The duration of the generated media in seconds.
resolution	string	No	1080p	720p, 1080p, 4k	Video resolution.
generate_audio	boolean	No	true	-	Whether to generate audio.
negative_prompt	string	No		-	Negative prompt for the generation.
seed	integer	No	-	-	The random seed to use for the generation.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Output values, usually URL strings; some models return text strings or structured result objects (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array<string \| object>	Array of generated outputs (empty when status is not completed). Items are usually URL strings, but may be text strings or structured result objects, depending on the model.
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to poll for the prediction result
data.status	string	Status: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Overview