WAN 2.6 converts text or images into videos (720p/1080p) with synced audio, faster and more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Idle
$0.5per run·~20 / $10
5-second cinematic shot of a lone post-apocalyptic soldier in armor and gas mask walking straight toward the camera along a cracked, dusty road between rows of rusted abandoned cars, holding a rifle and tattered flag, slow forward camera movement, smoke and dust drifting from ruined buildings in the background, warm orange sunset light, debris blowing slightly in the wind, realistic gritty atmosphere.
dynamic video of two professional fencers in white uniforms dueling on a metallic piste inside a bright old sports hall, sunlight streaming through tall windows onto the wooden floor and a small crowd of spectators in the background. Starting from a still mid-lunge, both fencers continue with quick footwork and agile attacks, blades clashing and flickering, with subtle handheld camera motion and realistic footsteps and metal sounds.
short energetic video in a cozy living room, filmed from a low camera on a fluffy rug. a french bulldog puppy repeatedly runs toward the camera and almost bumps the lens, sniffing and pawing playfully before darting back. in the background, a woman in a blue jumpsuit sits on the floor laughing and clapping, reacting to the dog’s zoomies, while a second dog wanders past and an orange treat bag sits on the rug. warm indoor light, handheld feel with small natural camera shakes, fast but clear motion, no text or logos.
cinematic video of three friends enjoying a slow, sunny picnic in a green field by a lake. the camera starts close on a young woman in a light brown shirt and shorts sitting on the grass, laughing as she slowly pops a strawberry into her mouth, sunlight on her face and her smartwatch catching the light. in the background, one friend in a beige top chats and gestures while sitting on the picnic blanket, and another in a light blue top lounges and stretches happily on the blanket. gentle breeze moving their hair and the grass, snacks on the blanket, soft handheld camera moves and relaxed pacing, no text or logos.
The woman is drinking her coffee, the dog is watching her, and barking
WAN 2.6 Image-to-Video is ’s latest WanXiang 2.6 image-to-video model. Give it a single image plus a prompt and it generates a 5–15s cinematic clip, with support for multi-shot storytelling and up to 1080p resolution.
image* – Required. The keyframe or base image to animate (URL or upload).
audio (optional) – Reserved field; can be used for advanced workflows that align motion with an external audio track. For normal use you can leave this empty.
prompt* – Describe the motion, story beats, camera moves, and style.
negative_prompt – Things to avoid (e.g. “watermark, text, distortion, extra limbs”).
resolution – One of:
720p
1080p
duration – One of 5s, 10s, 15s.
shot_type –
single → single-shot clip.
multi → when prompt expansion is on, the model can break your prompt into multiple shots for a richer narrative.
enable_prompt_expansion – If enabled, WAN 2.6 will expand shorter prompts into a more detailed internal script before generating.
seed – Fix for reproducible results; set to -1 for random, or any integer to lock the layout and motion pattern.
Output: an MP4 video at the chosen resolution tier.
| Resolution | 5 s | 10 s | 15 s |
|---|---|---|---|
| 720p | $0.50 | $1.00 | $1.50 |
| 1080p | $0.75 | $1.50 | $2.25 |
kwaivgi/kling-video-o1/image-to-video High-quality AI image-to-video generator from Kwaivgi, ideal for cinematic character shots, smooth camera motion, and social-ready short clips.
/wan-2.5/image-to-video ’s WAN 2.5 image-to-video model, designed for fast, coherent animation of still images into ads, product demos, and story-style videos.
openai/sora-2/image-to-video OpenAI Sora 2, a cutting-edge AI video generator that turns images into long, detailed, physics-aware scenes for filmic concepts and high-end content.
google/veo3.1/image-to-video Google Veo 3.1 image-to-video, optimized for crisp, cinematic motion and clean compositions, perfect for marketing visuals, trailers, and creative storytelling.
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Wan 2.6 Image To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/image-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"audio": "https://example.com/your-audio.mp3",
"negative_prompt": "blurry, low quality, distorted",
"resolution": "720p",
"duration": 5,
"shot_type": "single",
"enable_prompt_expansion": false,
"seed": -1
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("alibaba/wan-2.6/image-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"audio": "https://example.com/your-audio.mp3",
"negative_prompt": "blurry, low quality, distorted",
"resolution": "720p",
"duration": 5,
"shot_type": "single",
"enable_prompt_expansion": false,
"seed": -1
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"alibaba/wan-2.6/image-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"audio": "https://example.com/your-audio.mp3",
"negative_prompt": "blurry, low quality, distorted",
"resolution": "720p",
"duration": 5,
"shot_type": "single",
"enable_prompt_expansion": false,
"seed": -1
}
)
print(output["outputs"][0]) # → URL of the generated outputWan 2.6 Image To Video is a Alibaba model for video generation from images, exposed as a REST API on WaveSpeedAI. WAN 2.6 converts text or images into videos (720p/1080p) with synced audio, faster and more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-image-to-video.
Wan 2.6 Image To Video starts at $0.50 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `image`, `audio`, `resolution`, `duration`, `seed`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-image-to-video.
Average end-to-end generation time on WaveSpeedAI is around 148 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Alibaba). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.