Home/Explore/Google Models/google/veo3.1/text-to-video

text-to-video

google/veo3.1/text-to-video

Google Veo 3.1 introduces native 1080p resolution, delivering enhanced quality and flexibility for creators.

Whether to generate audio.

Idle

Your request will cost $3.2 per run.

One more thing:

ExamplesView all

README

🎥 Google Veo 3.1 — Text-to-Video (T2V) Model

Veo 3.1 T2V is the latest text-to-video model from Google DeepMind, designed to bring cinematic storytelling to life through text. It generates high-fidelity 1080p videos with synchronized, context-aware audio, realistic motion, and narrative consistency — making it one of the most advanced generative video systems ever released.

🌟 Why it stands out

  • 🎬 Cinematic Realism Produces natural lighting, smooth camera transitions, and accurate perspective for film-like motion.

  • 🔊 Native Audio Generation Generates synchronized ambient sound, dialogue, and music directly aligned with the visuals.

  • 🗣️ Dialogue & Lip-Sync Supports speaking characters and realistic facial expressions — perfect for storytelling, marketing, or short-form content.

  • 🧠 Subject Consistency (R2V) Maintains a character’s or object’s identity across frames using 1–3 reference images.

  • 🎞️ Video Interpolation Seamlessly animates transitions between two given frames — ideal for smooth start-to-end storytelling.

  • 📐 Flexible Output Supports both 720p and 1080p, at 24 FPS, duration for 4s, 6s, 8s, and in both 16:9 (landscape) and 9:16 (portrait) formats.

⚙️ Key Parameters

  • prompt — Describe your scene or story (e.g., “A drone shot flying over Las Vegas, transitioning from day to night with soft jazz in the background”).
  • durationSeconds — Choose video length (4s, 6s, or 8s).
  • resolution — 720p or 1080p.
  • aspectRatio — Landscape (16:9) or Portrait (9:16).

💰 Pricing (Preview Stage)

ModelDescriptionInput TypeOutputPrice
Veo 3.1 (Video + Audio)Generate videos with synchronized soundText / ImageVideo + Audio$0.40 / sec
Veo 3.1 (Video only)Generate high-quality silent videosText / ImageVideo$0.20 / sec

💡 Minimum cost: ~$3.20 per clip (based on 8s @ 1080p).

🚀 How to Use

  1. ✍️ Write a Prompt Describe the desired motion, camera style, lighting, and sound.

    Example: “A cinematic sunset over the ocean, waves glimmering as seagulls fly across the horizon.”

  2. ⚙️ Adjust Parameters Select duration, resolution (720p/1080p), and aspect ratio.

  3. ▶️ Generate Submit your request — Veo 3.1 will render motion, lighting, and synchronized audio.

  4. 💾 Preview & Download Review your video, refine your prompt if needed, then download the final MP4.

💡 Pro Tips

  • Keep prompts focused on one main action or subject for better coherence.
  • Use camera verbs like “tracking,” “zoom out,” or “handheld” for cinematic control.
  • Mention lighting and mood cues (e.g., “under soft moonlight,” “golden-hour glow”).
  • Use R2V for character-based storytelling; Interpolation for smooth transitions.
  • Avoid conflicting instructions (e.g., “fast zoom” and “slow motion” together).

🧾 Notes & Limitations

  • Generation time: ~2–3 minutes for an 8-second 1080p clip.
  • Frame rate fixed at 24 FPS.
  • Advanced controls (R2V, I2V, Interpolation) are mutually exclusive — only one per generation.
  • If your prompt is blocked, rewrite it and resubmit (safety thresholds may adjust during preview).