Giảm 50% mô hình Vidu Q3 & Q3 Pro · Chỉ trên WaveSpeedAI | 20/5 – 2/6
Home/Explore/Google/Veo3/Image To Video

Veo3 Image to Video

google /

Google Veo 3 is Google's flagship image-to-video model that creates audio-enabled videos from images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video
Input

Kéo & thả hoặc nhấp để tải lên

preview
Whether to generate audio.

Idle

$3.2per run

Next:

ExamplesView all

News anchor mid-action, looking straight at the camera. A vintage 1950s black-and-white television broadcast. A serious female news presenter sits at a desk, facing directly toward the audience, with a large old-school microphone in front. She wears a crisp suit, narrow tie, side-parted hair, and wireframe glasses. The presenter moves naturally: leans slightly forward, gestures with one hand, and maintains eye contact with the camera. Her lips are synced to say, "Breaking news: Google Veo 3 is now available on WaveSpeedAI." Contrast, sharp shadows, authentic grainy texture, classic black-and-white 1950s broadcast aesthetic. Vintage TV atmosphere.

A cinematic close-up of a barista crafting latte art in a bustling coffee shop. The scene alternates between her focused, skilled hands and customers watching appreciatively, highlighting the artistry and dedication in everyday routines.

A young woman standing at a balcony at sunrise, overlooking a quiet city. Wind gently rustles her hair. She speaks softly into a mic: 'Another day begins... I wonder what today will bring.

A man walking alone on a forest trail in autumn, leaves crunching underfoot. Birds chirp in the distance. He murmurs: 'Feels good to be away from everything for a while.

A close-up of a pair of hands carefully slicing a ripe mango on a wooden cutting board, golden sunlight streaming through a nearby window. The camera slowly zooms in as the knife glides smoothly through the juicy flesh, juice glistening. Soft lo-fi music plays in the background. Natural lighting, ASMR-style, cinematic depth of field.

A young woman walks alone under a transparent umbrella in a quiet alley during light rain, soft city lights reflecting on the wet pavement. Her pace is calm and thoughtful. The camera follows slowly behind her, occasional droplets hitting the lens. Subtle piano music plays, evoking a melancholic but peaceful mood. Dreamy, cinematic, slightly slow motion.

Static shot. Video. 90s sitcom living room scene. Two people mid-conversation in a colorful, cozy set. The woman smiles and gestures animatedly as she speaks, lips synced to: "Veo 3 generates sound. Dialogues, music, everything!" The man listens attentively, nodding slightly, holding a coffee mug.

Natural light. Field reporter mid-action in an open field, looking directly at the camera, tornado in the background. A reporter, in a muted dark raincoat (gray or navy), stands firmly in a wide, grassy field. The wind pulls at his coat and hair, but he keeps his gaze steady, looking directly into the camera. Behind him, a tornado swirls menacingly under an overcast sky. He speaks clearly, lips synced to: "A tornado is coming, please be safe." Slight handheld movement, unsteady framing, and minor shakes typical of field news footage. Natural, flat daylight with no stylization.

A cinematic documentary-style interview scene. An elderly Asian woman in a warmly lit study full of books and vintage lamps. The woman turns slightly to face the camera directly and says with a voice full of awe and sincerity, "I miss my husband so much." Her voice is aged, raspy yet gentle, filled with emotion and wonder. The lighting is soft and moody, with a shallow depth of field. No subtitles, no titles or overlays. The atmosphere is quiet, respectful, and emotionally powerful, like a heartfelt moment in a serious documentary.

A vibrant 2D animation of a young skateboarder in a colorful outfit performing tricks through a lively city park. Bold lines and bright hues create an energetic, playful atmosphere as the skateboarder maneuvers around obstacles.

Related Models

README

Google Veo 3 — Image-to-Video (I2V) Model

Veo 3 I2V is the standard image-to-video version of Google DeepMind’s Veo 3 generative model. It brings still images to life, creating cinematic 1080p videos with smooth, realistic motion, consistent lighting, and synchronized native audio.

🎬 Why it stands out

  • From Image to Motion Transform a single image into a natural, dynamic video sequence while preserving its original composition and style.

  • Cinematic Realism Produces high-fidelity motion with natural lighting, accurate perspective, and fluid camera transitions.

  • Native Audio Generation Automatically generates synchronized sound—including ambient noise, effects, and light music—perfectly aligned with the visuals.

  • Dialogue & Lip-Sync Enables speaking characters or realistic expressions, ideal for storytelling, marketing, and short-form content.

  • Consistent Subject & Style Retains the identity, color tone, and visual integrity of your input image throughout the motion sequence.

⚙️ Limits and Performance

PropertyDescription
InputSingle image + text prompt
Max Duration8 seconds
ResolutionUp to 1080p
AudioNative synchronized dialogue, ambient sound, and music
Output FormatMP4 with stereo audio

💰 Pricing

Every run needs $3.2 (both 720p and 1080p)

Without audio needs $1.2

✅ Commercial use allowed

🚀 How to Use

  1. Upload an Image Choose a clear, high-quality still image—this defines the subject, framing, and overall style.

  2. Write a Prompt Describe the desired motion, mood, and camera movement.

Example: “Slow cinematic zoom out as wind moves through the trees and sunlight flickers across the leaves.”

  1. Adjust Settings Select the video duration (up to 8 seconds) and output resolution (up to 1080p).

  2. Generate the Video Submit your prompt and image—Veo 3 I2V automatically creates motion, lighting, and audio.

  3. Preview & Download Review the result, refine the prompt if needed, and download the final MP4.

💡 Pro Tips

  • Use bright, high-contrast images for clearer motion and lighting.
  • Keep prompts focused on a single subject or action for best stability.
  • Add camera directions like “tracking shot,” “slow pan,” or “handheld style” to control movement.
  • Specify lighting and mood (e.g., bright daylight, soft sunset glow).
  • Avoid conflicting motion requests to maintain smooth results.

📝 Notes

  • Actual processing time depends on queue load and resolution.
  • Optimized for cinematic shorts, ads, and social media clips.
  • Ensure your uploaded image is clear, accessible, and legally usable.
  • Please ensure your prompts comply with Google’s Safety Guidelines — if an error occurs, revise your prompt and try again.
Accessibility:This website uses AI models provided by third parties.

Veo3 Image To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/google/veo3/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Veo3 Image To Video below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/google/veo3/image-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "image": "https://example.com/your-input.jpg",
    "aspect_ratio": "16:9",
    "duration": 8,
    "resolution": "720p",
    "generate_audio": true,
    "negative_prompt": "blurry, low quality, distorted",
    "seed": 0
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("google/veo3/image-to-video", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "image": "https://example.com/your-input.jpg",
        "aspect_ratio": "16:9",
        "duration": 8,
        "resolution": "720p",
        "generate_audio": true,
        "negative_prompt": "blurry, low quality, distorted",
        "seed": 0
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "google/veo3/image-to-video",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "image": "https://example.com/your-input.jpg",
    "aspect_ratio": "16:9",
    "duration": 8,
    "resolution": "720p",
    "generate_audio": true,
    "negative_prompt": "blurry, low quality, distorted",
    "seed": 0
}
)

print(output["outputs"][0])  # → URL of the generated output

Veo3 Image To Video API — Frequently asked questions

What is the Veo3 Image To Video API?

Veo3 Image To Video is a Google model for video generation from images, exposed as a REST API on WaveSpeedAI. Google Veo 3 is Google's flagship image-to-video model that creates audio-enabled videos from images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Veo3 Image To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/google/google-veo3-image-to-video.

How much does Veo3 Image To Video cost per run?

Veo3 Image To Video starts at $3.20 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Veo3 Image To Video accept?

Key inputs: `prompt`, `image`, `aspect_ratio`, `resolution`, `duration`, `seed`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/google/google-veo3-image-to-video.

How long does Veo3 Image To Video take to generate?

Average end-to-end generation time on WaveSpeedAI is around 120 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Veo3 Image To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Google). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.