WAN 2.6 Text-to-Video turns plain prompts into coherent, cinematic clips with crisp detail, stable motion, and strong instruction-following—great for ads, explainers, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Bereit
$0.5pro Durchlauf·~20 / $10
A stylish young male artist is spray-painting a colorful mural of flowers on a brick wall in a sunny city alleyway. Suddenly, the painted flowers magically detach from the wall and transform into glowing, semi-transparent 3D butterflies. The artist looks surprised and then delighted, reaching out his hand to let one butterfly land on his finger. The scene is bathed in warm natural sunlight, dust motes dancing in the air. Vibrant colors, smooth motion, magical realism, award-winning cinematography.
Cinematic sci-fi scene. Medium shot of a weary astronaut in a dusty, abandoned spaceship corridor. The camera slowly pushes in towards his face. He looks shocked. Rack focus from his face to his gloved hand, revealing he is holding a small, glowing green plant sprout. Blue emergency lights flickering in the background. High suspense, emotional moment.
First-person POV shot (camera movement forward). Moving quickly through a dark, textured rock tunnel towards a blindingly bright exit. As the camera bursts out of the tunnel, the view instantly widens to reveal a breathtaking, sunny alpine meadow with snow-capped mountains and blooming wildflowers. Exposure adjusts rapidly from dark to light. Epic scale, cinematic transition, immersive experience.
A lone female cyborg walks through a neon-soaked cyberpunk city at night. Reflections ripple across puddles as holograms flicker overhead. The camera follows her from behind in a slow tracking shot, drifting upward toward the glowing skyscrapers. Soft rain falls, lights refract across her metallic skin, cinematic atmosphere, ultra-detailed, high-contrast, futuristic mood.
glowing blue fox runs across a bioluminescent forest at night. Mushrooms pulse with soft light as particles float in the air. The camera follows close behind the fox, weaving between trees. Magical atmosphere, vibrant colors, fantasy cinematic style, sense of wonder and discovery
WAN 2.6 Text-to-Video is ’s WanXiang 2.6 model that turns a pure text prompt (optionally with audio) into a 5–15s cinematic clip. It supports multi-shot storytelling, vertical or landscape formats, and resolutions up to 1080p, making it a strong fit for ads, trailers, and social content.
prompt* – Main description of the video: scene, characters, motion, camera moves, style.
negative_prompt – Things to avoid (e.g. watermark, text, distortion, extra limbs).
audio (optional) – URL or file of an audio track; reserved for advanced workflows where you want to align motion with existing sound.
size – Resolution presets:
720p tier
1280×720 (landscape)
720×1280 (vertical)
1080p tier
1920×1080 (landscape)
1080×1920 (vertical)
duration – One of 5s, 10s, 15s.
shot_type –
single → single continuous shot.
multi → when combined with enable_prompt_expansion, lets the model create a multi-shot sequence.
enable_prompt_expansion – If enabled, WAN 2.6 first expands your prompt into an internal, more detailed script before generating.
seed – Random seed; set to -1 for different results each time or use a fixed integer for reproducible motion/layout.
Output: an MP4 video at the chosen resolution and orientation.
Pricing depends on duration and resolution tier:
| Resolution | 5 s | 10 s | 15 s |
|---|---|---|---|
| 720p | $0.50 | $1.00 | $1.50 |
| 1080p | $0.75 | $1.50 | $2.25 |
-1 for variation) and click Run to generate your clip.kwaivgi/kling-video-o1/text-to-video Kwaivgi’s cinematic text-to-video model, great for character-driven scenes, smooth camera moves, and short-form storytelling.
/wan-2.5/text-to-video ’s WAN 2.5 prompt-to-video engine, focused on fast, coherent ads, explainers, and product demos.
google/veo3.1/text-to-video Google Veo 3.1 text-to-video, tuned for crisp compositions, filmic motion, and marketing-ready visuals.
openai/sora-2/text-to-video OpenAI Sora 2, a high-end text-to-video generator for long, detailed, physics-aware scenes and premium creative content.
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Wan 2.6 Text To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"audio": "https://example.com/your-audio.mp3",
"size": "1280*720",
"duration": 5,
"shot_type": "single",
"enable_prompt_expansion": false,
"seed": -1
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("alibaba/wan-2.6/text-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"audio": "https://example.com/your-audio.mp3",
"size": "1280*720",
"duration": 5,
"shot_type": "single",
"enable_prompt_expansion": false,
"seed": -1
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"alibaba/wan-2.6/text-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"audio": "https://example.com/your-audio.mp3",
"size": "1280*720",
"duration": 5,
"shot_type": "single",
"enable_prompt_expansion": false,
"seed": -1
}
)
print(output["outputs"][0]) # → URL of the generated outputWan 2.6 Text To Video is a Alibaba model for video generation, exposed as a REST API on WaveSpeedAI. WAN 2.6 Text-to-Video turns plain prompts into coherent, cinematic clips with crisp detail, stable motion, and strong instruction-following—great for ads, explainers, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-text-to-video.
Wan 2.6 Text To Video starts at $0.50 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `audio`, `duration`, `size`, `seed`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-text-to-video.
Average end-to-end generation time on WaveSpeedAI is around 414 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Alibaba). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.