Seedance 2.0 GIẢM 20% | Tạo trong Video Generator →
Home/Explore/Alibaba/Wan 2.6/Text To Video

Wan 2.6 Text to Video

alibaba /

WAN 2.6 Text-to-Video turns plain prompts into coherent, cinematic clips with crisp detail, stable motion, and strong instruction-following—great for ads, explainers, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-video
Input

Kéo & thả hoặc nhấp để tải lên

If set to true, the prompt optimizer will be enabled.

Idle

$0.5per run·~20 / $10

Next:

ExamplesView all

A stylish young male artist is spray-painting a colorful mural of flowers on a brick wall in a sunny city alleyway. Suddenly, the painted flowers magically detach from the wall and transform into glowing, semi-transparent 3D butterflies. The artist looks surprised and then delighted, reaching out his hand to let one butterfly land on his finger. The scene is bathed in warm natural sunlight, dust motes dancing in the air. Vibrant colors, smooth motion, magical realism, award-winning cinematography.

Cinematic sci-fi scene. Medium shot of a weary astronaut in a dusty, abandoned spaceship corridor. The camera slowly pushes in towards his face. He looks shocked. Rack focus from his face to his gloved hand, revealing he is holding a small, glowing green plant sprout. Blue emergency lights flickering in the background. High suspense, emotional moment.

First-person POV shot (camera movement forward). Moving quickly through a dark, textured rock tunnel towards a blindingly bright exit. As the camera bursts out of the tunnel, the view instantly widens to reveal a breathtaking, sunny alpine meadow with snow-capped mountains and blooming wildflowers. Exposure adjusts rapidly from dark to light. Epic scale, cinematic transition, immersive experience.

A lone female cyborg walks through a neon-soaked cyberpunk city at night. Reflections ripple across puddles as holograms flicker overhead. The camera follows her from behind in a slow tracking shot, drifting upward toward the glowing skyscrapers. Soft rain falls, lights refract across her metallic skin, cinematic atmosphere, ultra-detailed, high-contrast, futuristic mood.

glowing blue fox runs across a bioluminescent forest at night. Mushrooms pulse with soft light as particles float in the air. The camera follows close behind the fox, weaving between trees. Magical atmosphere, vibrant colors, fantasy cinematic style, sense of wonder and discovery

Related Models

README

WAN 2.6 Text-to-Video

WAN 2.6 Text-to-Video is ’s WanXiang 2.6 model that turns a pure text prompt (optionally with audio) into a 5–15s cinematic clip. It supports multi-shot storytelling, vertical or landscape formats, and resolutions up to 1080p, making it a strong fit for ads, trailers, and social content.

🚀 Highlights

  • Prompt-only video generation – No reference image required: describe the scene and WAN 2.6 builds the entire sequence.
  • Multi-shot narratives – With prompt expansion and multishots enabled, the model can split your idea into several shots while preserving key characters and style.
  • 5–15 second clips – Enough room for intros, reveals, and full micro-stories.
  • Flexible sizes – Horizontal and vertical presets across ** 720p / 1080p** tiers.
  • Prompt-aware consistency – Keeps identities, outfits, and scene semantics coherent across the whole clip.

🧩 Parameters

  • prompt* – Main description of the video: scene, characters, motion, camera moves, style.

  • negative_prompt – Things to avoid (e.g. watermark, text, distortion, extra limbs).

  • audio (optional) – URL or file of an audio track; reserved for advanced workflows where you want to align motion with existing sound.

  • size – Resolution presets:

  • 720p tier

  • 1280×720 (landscape)

  • 720×1280 (vertical)

  • 1080p tier

  • 1920×1080 (landscape)

  • 1080×1920 (vertical)

  • duration – One of 5s, 10s, 15s.

  • shot_type

  • single → single continuous shot.

  • multi → when combined with enable_prompt_expansion, lets the model create a multi-shot sequence.

  • enable_prompt_expansion – If enabled, WAN 2.6 first expands your prompt into an internal, more detailed script before generating.

  • seed – Random seed; set to -1 for different results each time or use a fixed integer for reproducible motion/layout.

Output: an MP4 video at the chosen resolution and orientation.

💰 Pricing

Pricing depends on duration and resolution tier:

Resolution5 s10 s15 s
720p$0.50$1.00$1.50
1080p$0.75$1.50$2.25

✅ How to Use

  1. Write your prompt – Describe what happens, who appears, how the camera moves, and the visual style.
  2. (Optional) Add a negative_prompt to suppress artifacts or unwanted elements.
  3. (Optional) Provide an audio track if your workflow requires it.
  4. Choose a size (one of the 720p / 1080p presets, landscape or vertical).
  5. Set duration to 5 / 10 / 15 seconds.
  6. Enable prompt_expansion and multishots if you want richer, multi-shot storytelling.
  7. Set a seed (or leave -1 for variation) and click Run to generate your clip.

💡 Prompt Tips

  • Start with a clear setting + subject + action: “Cyberpunk city street at night, rain on the ground, a lone biker rides through neon fog, cinematic camera tracking shot.”
  • For multi-shot stories, hint at structure: “Shot 1: wide city skyline at dawn; Shot 2: hero walks across rooftop; Shot 3: close-up as they put on helmet.”
  • Keep negative prompts short and focused (e.g. blurry, watermark, extra limbs) instead of full sentences.
  • Match size to platform: vertical (720×1280 / 1080×1920) for Shorts/Reels/TikTok, landscape for YouTube and web.

More Models to Try

  • kwaivgi/kling-video-o1/text-to-video Kwaivgi’s cinematic text-to-video model, great for character-driven scenes, smooth camera moves, and short-form storytelling.

  • /wan-2.5/text-to-video ’s WAN 2.5 prompt-to-video engine, focused on fast, coherent ads, explainers, and product demos.

  • google/veo3.1/text-to-video Google Veo 3.1 text-to-video, tuned for crisp compositions, filmic motion, and marketing-ready visuals.

  • openai/sora-2/text-to-video OpenAI Sora 2, a high-end text-to-video generator for long, detailed, physics-aware scenes and premium creative content.

Accessibility:This website uses AI models provided by third parties.

Wan 2.6 Text To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Wan 2.6 Text To Video below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "negative_prompt": "blurry, low quality, distorted",
    "audio": "https://example.com/your-audio.mp3",
    "size": "1280*720",
    "duration": 5,
    "shot_type": "single",
    "enable_prompt_expansion": false,
    "seed": -1
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("alibaba/wan-2.6/text-to-video", {
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "negative_prompt": "blurry, low quality, distorted",
        "audio": "https://example.com/your-audio.mp3",
        "size": "1280*720",
        "duration": 5,
        "shot_type": "single",
        "enable_prompt_expansion": false,
        "seed": -1
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "alibaba/wan-2.6/text-to-video",
    {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "negative_prompt": "blurry, low quality, distorted",
    "audio": "https://example.com/your-audio.mp3",
    "size": "1280*720",
    "duration": 5,
    "shot_type": "single",
    "enable_prompt_expansion": false,
    "seed": -1
}
)

print(output["outputs"][0])  # → URL of the generated output

Wan 2.6 Text To Video API — Frequently asked questions

What is the Wan 2.6 Text To Video API?

Wan 2.6 Text To Video is a Alibaba model for video generation, exposed as a REST API on WaveSpeedAI. WAN 2.6 Text-to-Video turns plain prompts into coherent, cinematic clips with crisp detail, stable motion, and strong instruction-following—great for ads, explainers, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Wan 2.6 Text To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-text-to-video.

How much does Wan 2.6 Text To Video cost per run?

Wan 2.6 Text To Video starts at $0.50 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Wan 2.6 Text To Video accept?

Key inputs: `prompt`, `audio`, `duration`, `size`, `seed`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-text-to-video.

How long does Wan 2.6 Text To Video take to generate?

Average end-to-end generation time on WaveSpeedAI is around 414 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Wan 2.6 Text To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Alibaba). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.