Google Veo 3.1 converts text prompts into videos with synchronized audio at native 1080p for high-quality outputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
En attente
$3.2par exécution
Two person street interview in New York City. Sample Dialogue: Host: "Did you hear the news?" Person: "Yes! Veo 3.1 is now available on WaveSpeedAI. If you want to see it, go check their website."
Action: A man adjusts his suit cuff before entering an evening gala. His watch glints under chandelier light. Camera: Close-up of ticking hands, reflections on glass, slow pull-back to reveal his composed face. Ambient Sound: Subtle ticking, low ambient orchestral hum, murmuring crowd. Dialogue: Voice-over (deep, calm): “Time doesn’t wait. But it remembers those who dare.” Logo fade-in: “Chronos — Measure Your Moments.”
This is a 3-beat prompt. Convert each beat into a distinct story. [Beat 1] A detective in a rainy alley whispers to himself: 'The clues don't add up.' Thunder rumbles. [Beat 2] He picks up a clue, rain pattering on pavement. [Beat 3] Close-up: He says 'Gotcha!' with dramatic music swell.
```json { "description": "A mother and daughter share a magical moment reading a Japanese folktale that comes to life from their book in a cozy evening setting.", "Shots": { "Shot_1": { "Camera_Angle": { "shot_size": "medium two-shot", "angle": "front view", "focus": "mother and daughter with book", "details": "warm intimate framing showing both characters on sofa" }, "Camera_Movement": { "movement": "gentle glide", "Description": "Slow, emotional pacing with subtle movement to enhance the intimate atmosphere" }, "Transition": { "type": "continuous glide", "target": "daughter's face" }, "Background": { "use": "Japanese living room" }, "Action": { "character": "Mother", "action": "sits on sofa with daughter", "action": "arm gently wrapped behind daughter", "action": "looking down at open book on their laps", "action": "wavy hair catches the light shimmer" }, "Action": { "character": "Daughter", "action": "sits close to mother on sofa", "action": "looks down at book with wonder", "action": "watches as magical scene unfolds" }, "Action": { "character": "Book", "action": "fills lower frame", "action": "radiates soft golden light", "action": "pages pulse and ripple", "action": "pop-up world begins to unfold with Urashima Tarō riding sea turtle" } }, "Shot_2": { "Camera_Angle": { "shot_size": "close-up", "angle": "left-side perspective", "focus": "daughter's profile", "details": "mother blurred in background" }, "Camera_Movement": { "movement": "static", "Description": "Held frame capturing daughter's wonder" }, "Dialogue": { "character": "Daughter", "line": "Mama, look! He's moving!", "timin
Action: A young woman sits by a window at dawn, sipping coffee while sunlight cuts across the table. Steam rises from her cup as she smiles. Camera: Slow pan from her hand to her face, then to the glowing skyline outside. Ambient Sound: Soft morning breeze, faint café chatter, gentle acoustic guitar. Dialogue: Woman (voice-over): “Every sunrise is a blank page. Start bold. Start with Brew&Co.”
Action: Inside a dimly lit recording studio, two producers argue over a mix. One sits at the console, the other paces back and forth. Camera: Handheld motion, circling around them as lights flicker from monitors. Ambient Sound: Low bass hum, clicking buttons, faint static from speakers. Dialogue: Maya (frustrated): “You keep chasing perfection, but the track’s already alive.” Zane (snaps): “Alive isn’t enough — it has to move people.”
Action: Two friends sit on a rooftop ledge overlooking a glowing city skyline at sunset. One is relaxed, the other deep in thought. Camera: Slow dolly-in toward the two as warm sunlight flares through the lens. Ambient Sound: City traffic far below, soft wind, faint indie guitar music playing from a phone speaker. Dialogue: Emma (softly): “You ever think about leaving all this behind?” Ryan (smirks): “Only when I forget how beautiful it looks from up here.”
A vibrant young woman in her early 20s runs toward the camera in Times Square at night, ecstatic and wide-eyed, shouting passionately into a black microphone. She wears a neon green windbreaker and black headphones around her neck. She yells: “Yo, Veo3.1 just dropped on WaveSpeedAI — sound and texture are next level, try it right now!” Wet reflective streets, glowing blue-white-magenta billboards, blurred pedestrians, dynamic handheld follow-shot, sharp face focus, shallow depth of field. 4K UHD, saturated colors, viral UGC style.
Action: A performer stands before a mirror backstage, adjusting costume and makeup under bright bulbs. Camera: Slow zoom-in toward the reflection, emphasizing tension in the eyes. Ambient Sound: Distant applause, creaking floorboards, faint orchestra tuning. Dialogue: Performer (to self): “This is it. One more chance to mean it.”
Scene 1: Action: Inside a modern coworking office, two young entrepreneurs rehearse a startup pitch in front of a whiteboard filled with diagrams. Dialogue: Anna (determined): “We’re not selling a product. We’re selling a vision.” Leo (nodding): “Then let’s make them believe it.” Ambient Sound: Office air hum, typing in the background, occasional distant city noise. Scene 2: Action: Cut to the real pitch — same outfits, gestures, and tone. The lighting shifts to bright stage lights as they present confidently. Continuity: Smooth cross-scene identity retention; their confidence builds visually and emotionally.
Scene 1: Action: A cozy downtown café at sunrise. Steam rises from two coffee mugs. Characters: A young woman (Emma) and her friend (Jake) sit by the window, sunlight filtering through. Dialogue: Emma (warmly): “I can’t believe it’s been a year since we started this.” Jake (smiling): “Yeah, and we’re still chasing the same dream.” Ambient Sound: Soft background jazz, light chatter, the sound of a coffee machine. Scene 2: Action: The camera slowly pans as they laugh together, then fades into a flashback of their first meeting at the same café — same seats, same sunlight. Continuity: Same faces, gestures, and lighting tone across both shots.
Veo 3.1 T2V is the latest text-to-video model from Google DeepMind, designed to bring cinematic storytelling to life through text. It generates high-fidelity 1080p videos with synchronized, context-aware audio, realistic motion, and narrative consistency — making it one of the most advanced generative video systems ever released.
🎬 Cinematic Realism Produces natural lighting, smooth camera transitions, and accurate perspective for film-like motion.
🔊 Native Audio Generation Generates synchronized ambient sound, dialogue, and music directly aligned with the visuals.
🗣️ Dialogue & Lip-Sync Supports speaking characters and realistic facial expressions — perfect for storytelling, marketing, or short-form content.
🧠 Subject Consistency (R2V) Maintains a character’s or object’s identity across frames using 1–3 reference images.
🎞️ Video Interpolation Seamlessly animates transitions between two given frames — ideal for smooth start-to-end storytelling.
📐 Flexible Output Supports both 720p and 1080p, at 24 FPS, duration for 4s, 6s, 8s, and in both 16:9 (landscape) and 9:16 (portrait) formats.
| Model | Description | Input Type | Output | Price |
|---|---|---|---|---|
| Veo 3.1 (Video + Audio) | Generate videos with synchronized sound | Text / Image | Video + Audio | $0.40 / sec |
| Veo 3.1 (Video only) | Generate high-quality silent videos | Text / Image | Video | $0.20 / sec |
💡 Minimum cost: ~$3.20 per clip (based on 8s @ 1080p). Without audio needs $1.60.
Example: “A cinematic sunset over the ocean, waves glimmering as seagulls fly across the horizon.”
⚙️ Adjust Parameters Select duration, resolution (720p/1080p), and aspect ratio.
▶️ Generate Submit your request — Veo 3.1 will render motion, lighting, and synchronized audio.
💾 Preview & Download Review your video, refine your prompt if needed, then download the final MP4.
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Veo3.1 Text To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/google/veo3.1/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "1080p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("google/veo3.1/text-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "1080p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"google/veo3.1/text-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "16:9",
"duration": 8,
"resolution": "1080p",
"generate_audio": true,
"negative_prompt": "blurry, low quality, distorted",
"seed": 0
}
)
print(output["outputs"][0]) # → URL of the generated outputVeo3.1 Text To Video is a Google model for video generation, exposed as a REST API on WaveSpeedAI. Google Veo 3.1 converts text prompts into videos with synchronized audio at native 1080p for high-quality outputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/google/google-veo3.1-text-to-video.
Veo3.1 Text To Video starts at $3.20 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `aspect_ratio`, `resolution`, `duration`, `seed`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/google/google-veo3.1-text-to-video.
Average end-to-end generation time on WaveSpeedAI is around 95 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Google). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.