Veo3.1 Text to Video | Powerful Text-to-Video API

Home/Explore/Google/Veo3.1/Text To Video

google /

Google Veo 3.1 converts text prompts into videos with synchronized audio at native 1080p for high-quality outputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

Input

prompt*

```json
{
  "description": "A mother and daughter share a magical moment reading a Japanese folktale that comes to life from their book in a cozy evening setting.",
  "Shots": {
    "Shot_1": {
      "Camera_Angle": {
        "shot_size": "medium two-shot",
        "angle": "front view",
        "focus": "mother and daughter with book",
        "details": "warm intimate framing showing both characters on sofa"
      },
      "Camera_Movement": {
        "movement": "gentle glide",
        "Description": "Slow, emotional pacing with subtle movement to enhance the intimate atmosphere"
      },
      "Transition": {
        "type": "continuous glide",
        "target": "daughter's face"
      },
      "Background": {
        "use": "Japanese living room"
      },
      "Action": {
        "character": "Mother",
        "action": "sits on sofa with daughter",
        "action": "arm gently wrapped behind daughter",
        "action": "looking down at open book on their laps",
        "action": "wavy hair catches the light shimmer"
      },
      "Action": {
        "character": "Daughter",
        "action": "sits close to mother on sofa",
        "action": "looks down at book with wonder",
        "action": "watches as magical scene unfolds"
      },
      "Action": {
        "character": "Book",
        "action": "fills lower frame",
        "action": "radiates soft golden light",
        "action": "pages pulse and ripple",
        "action": "pop-up world begins to unfold with Urashima Tarō riding sea turtle"
      }
    },
    "Shot_2": {
      "Camera_Angle": {
        "shot_size": "close-up",
        "angle": "left-side perspective",
        "focus": "daughter's profile",
        "details": "mother blurred in background"
      },
      "Camera_Movement": {
        "movement": "static",
        "Description": "Held frame capturing daughter's wonder"
      },
      "Dialogue": {
        "character": "Daughter",
        "line": "Mama, look! He's moving!",
        "timin

aspect_ratio

duration

resolution

generate_audio

Whether to generate audio.

negative_prompt

seed

Enable Safety Checker

Idle

$3.2per run

ExamplesView all

Two person street interview in New York City. Sample Dialogue: Host: "Did you hear the news?" Person: "Yes! Veo 3.1 is now available on WaveSpeedAI. If you want to see it, go check their website."

```json { "description": "A mother and daughter share a magical moment reading a Japanese folktale that comes to life from their book in a cozy evening setting.", "Shots": { "Shot_1": { "Camera_Angle": { "shot_size": "medium two-shot", "angle": "front view", "focus": "mother and daughter with book", "details": "warm intimate framing showing both characters on sofa" }, "Camera_Movement": { "movement": "gentle glide", "Description": "Slow, emotional pacing with subtle movement to enhance the intimate atmosphere" }, "Transition": { "type": "continuous glide", "target": "daughter's face" }, "Background": { "use": "Japanese living room" }, "Action": { "character": "Mother", "action": "sits on sofa with daughter", "action": "arm gently wrapped behind daughter", "action": "looking down at open book on their laps", "action": "wavy hair catches the light shimmer" }, "Action": { "character": "Daughter", "action": "sits close to mother on sofa", "action": "looks down at book with wonder", "action": "watches as magical scene unfolds" }, "Action": { "character": "Book", "action": "fills lower frame", "action": "radiates soft golden light", "action": "pages pulse and ripple", "action": "pop-up world begins to unfold with Urashima Tarō riding sea turtle" } }, "Shot_2": { "Camera_Angle": { "shot_size": "close-up", "angle": "left-side perspective", "focus": "daughter's profile", "details": "mother blurred in background" }, "Camera_Movement": { "movement": "static", "Description": "Held frame capturing daughter's wonder" }, "Dialogue": { "character": "Daughter", "line": "Mama, look! He's moving!", "timin

Action: A man adjusts his suit cuff before entering an evening gala. His watch glints under chandelier light. Camera: Close-up of ticking hands, reflections on glass, slow pull-back to reveal his composed face. Ambient Sound: Subtle ticking, low ambient orchestral hum, murmuring crowd. Dialogue: Voice-over (deep, calm): “Time doesn’t wait. But it remembers those who dare.” Logo fade-in: “Chronos — Measure Your Moments.”

This is a 3-beat prompt. Convert each beat into a distinct story. [Beat 1] A detective in a rainy alley whispers to himself: 'The clues don't add up.' Thunder rumbles. [Beat 2] He picks up a clue, rain pattering on pavement. [Beat 3] Close-up: He says 'Gotcha!' with dramatic music swell.

Action: A young woman sits by a window at dawn, sipping coffee while sunlight cuts across the table. Steam rises from her cup as she smiles. Camera: Slow pan from her hand to her face, then to the glowing skyline outside. Ambient Sound: Soft morning breeze, faint café chatter, gentle acoustic guitar. Dialogue: Woman (voice-over): “Every sunrise is a blank page. Start bold. Start with Brew&Co.”

Action: Inside a dimly lit recording studio, two producers argue over a mix. One sits at the console, the other paces back and forth. Camera: Handheld motion, circling around them as lights flicker from monitors. Ambient Sound: Low bass hum, clicking buttons, faint static from speakers. Dialogue: Maya (frustrated): “You keep chasing perfection, but the track’s already alive.” Zane (snaps): “Alive isn’t enough — it has to move people.”

Action: Two friends sit on a rooftop ledge overlooking a glowing city skyline at sunset. One is relaxed, the other deep in thought. Camera: Slow dolly-in toward the two as warm sunlight flares through the lens. Ambient Sound: City traffic far below, soft wind, faint indie guitar music playing from a phone speaker. Dialogue: Emma (softly): “You ever think about leaving all this behind?” Ryan (smirks): “Only when I forget how beautiful it looks from up here.”

A vibrant young woman in her early 20s runs toward the camera in Times Square at night, ecstatic and wide-eyed, shouting passionately into a black microphone. She wears a neon green windbreaker and black headphones around her neck. She yells: “Yo, Veo3.1 just dropped on WaveSpeedAI — sound and texture are next level, try it right now!” Wet reflective streets, glowing blue-white-magenta billboards, blurred pedestrians, dynamic handheld follow-shot, sharp face focus, shallow depth of field. 4K UHD, saturated colors, viral UGC style.

Action: A performer stands before a mirror backstage, adjusting costume and makeup under bright bulbs. Camera: Slow zoom-in toward the reflection, emphasizing tension in the eyes. Ambient Sound: Distant applause, creaking floorboards, faint orchestra tuning. Dialogue: Performer (to self): “This is it. One more chance to mean it.”

Scene 1: Action: Inside a modern coworking office, two young entrepreneurs rehearse a startup pitch in front of a whiteboard filled with diagrams. Dialogue: Anna (determined): “We’re not selling a product. We’re selling a vision.” Leo (nodding): “Then let’s make them believe it.” Ambient Sound: Office air hum, typing in the background, occasional distant city noise. Scene 2: Action: Cut to the real pitch — same outfits, gestures, and tone. The lighting shifts to bright stage lights as they present confidently. Continuity: Smooth cross-scene identity retention; their confidence builds visually and emotionally.

Scene 1: Action: A cozy downtown café at sunrise. Steam rises from two coffee mugs. Characters: A young woman (Emma) and her friend (Jake) sit by the window, sunlight filtering through. Dialogue: Emma (warmly): “I can’t believe it’s been a year since we started this.” Jake (smiling): “Yeah, and we’re still chasing the same dream.” Ambient Sound: Soft background jazz, light chatter, the sound of a coffee machine. Scene 2: Action: The camera slowly pans as they laugh together, then fades into a flashback of their first meeting at the same café — same seats, same sunlight. Continuity: Same faces, gestures, and lighting tone across both shots.

Related Models

nano-banana-2-lite/text-to-image

text-to-image

nano-banana-2-lite/edit

image-to-image

gemini-omni-flash/text-to-video

text-to-video

gemini-omni-flash/image-to-video

image-to-video

gemini-omni-flash/reference-to-video

image-to-video

gemini-omni-flash/video-edit

video-to-video

README

🎥 Google Veo 3.1 — Text-to-Video (T2V) Model

Veo 3.1 T2V is the latest text-to-video model from Google DeepMind, designed to bring cinematic storytelling to life through text. It generates high-fidelity 1080p videos with synchronized, context-aware audio, realistic motion, and narrative consistency — making it one of the most advanced generative video systems ever released.

🌟 Why it stands out

🎬 Cinematic Realism Produces natural lighting, smooth camera transitions, and accurate perspective for film-like motion.
🔊 Native Audio Generation Generates synchronized ambient sound, dialogue, and music directly aligned with the visuals.
🗣️ Dialogue & Lip-Sync Supports speaking characters and realistic facial expressions — perfect for storytelling, marketing, or short-form content.
🧠 Subject Consistency (R2V) Maintains a character’s or object’s identity across frames using 1–3 reference images.
🎞️ Video Interpolation Seamlessly animates transitions between two given frames — ideal for smooth start-to-end storytelling.
📐 Flexible Output Supports both 720p and 1080p, at 24 FPS, duration for 4s, 6s, 8s, and in both 16:9 (landscape) and 9:16 (portrait) formats.

⚙️ Key Parameters

prompt — Describe your scene or story (e.g., “A drone shot flying over Las Vegas, transitioning from day to night with soft jazz in the background”).
durationSeconds — Choose video length (4s, 6s, or 8s).
resolution — 720p or 1080p.
aspectRatio — Landscape (16:9) or Portrait (9:16).

💰 Pricing (Preview Stage)

Model	Description	Input Type	Output	Price
Veo 3.1 (Video + Audio)	Generate videos with synchronized sound	Text / Image	Video + Audio	$0.40 / sec
Veo 3.1 (Video only)	Generate high-quality silent videos	Text / Image	Video	$0.20 / sec

💡 Minimum cost: ~$3.20 per clip (based on 8s @ 1080p). Without audio needs $1.60.

🚀 How to Use

✍️ Write a Prompt Describe the desired motion, camera style, lighting, and sound.

Example: “A cinematic sunset over the ocean, waves glimmering as seagulls fly across the horizon.”

⚙️ Adjust Parameters Select duration, resolution (720p/1080p), and aspect ratio.
▶️ Generate Submit your request — Veo 3.1 will render motion, lighting, and synchronized audio.
💾 Preview & Download Review your video, refine your prompt if needed, then download the final MP4.

💡 Pro Tips

Keep prompts focused on one main action or subject for better coherence.
Use camera verbs like “tracking,” “zoom out,” or “handheld” for cinematic control.
Mention lighting and mood cues (e.g., “under soft moonlight,” “golden-hour glow”).
Use R2V for character-based storytelling; Interpolation for smooth transitions.
Avoid conflicting instructions (e.g., “fast zoom” and “slow motion” together).

🧾 Notes & Limitations

Generation time: ~2–3 minutes for an 8-second 1080p clip.
Frame rate fixed at 24 FPS.
Advanced controls (R2V, I2V, Interpolation) are mutually exclusive — only one per generation.
If your prompt is blocked, rewrite it and resubmit (safety thresholds may adjust during preview).

Note:This website uses AI models provided by third parties.