Wan 2.6 Text to Video

WAN 2.6 Text-to-Video

WAN 2.6 Text-to-Video is ’s WanXiang 2.6 model that turns a pure text prompt (optionally with audio) into a 5–15s cinematic clip. It supports multi-shot storytelling, vertical or landscape formats, and resolutions up to 1080p, making it a strong fit for ads, trailers, and social content.

🚀 Highlights

Prompt-only video generation – No reference image required: describe the scene and WAN 2.6 builds the entire sequence.
Multi-shot narratives – With prompt expansion and multishots enabled, the model can split your idea into several shots while preserving key characters and style.
5–15 second clips – Enough room for intros, reveals, and full micro-stories.
Flexible sizes – Horizontal and vertical presets across ** 720p / 1080p** tiers.
Prompt-aware consistency – Keeps identities, outfits, and scene semantics coherent across the whole clip.

🧩 Parameters

prompt* – Main description of the video: scene, characters, motion, camera moves, style.
negative_prompt – Things to avoid (e.g. watermark, text, distortion, extra limbs).
audio (optional) – URL or file of an audio track; reserved for advanced workflows where you want to align motion with existing sound.
size – Resolution presets:
720p tier
1280×720 (landscape)
720×1280 (vertical)
1080p tier
1920×1080 (landscape)
1080×1920 (vertical)
duration – One of 5s, 10s, 15s.
shot_type –
single → single continuous shot.
multi → when combined with enable_prompt_expansion, lets the model create a multi-shot sequence.
enable_prompt_expansion – If enabled, WAN 2.6 first expands your prompt into an internal, more detailed script before generating.
seed – Random seed; set to -1 for different results each time or use a fixed integer for reproducible motion/layout.

Output: an MP4 video at the chosen resolution and orientation.

💰 Pricing

Pricing depends on duration and resolution tier:

Resolution	5 s	10 s	15 s
720p	$0.50	$1.00	$1.50
1080p	$0.75	$1.50	$2.25

✅ How to Use

Write your prompt – Describe what happens, who appears, how the camera moves, and the visual style.
(Optional) Add a negative_prompt to suppress artifacts or unwanted elements.
(Optional) Provide an audio track if your workflow requires it.
Choose a size (one of the 720p / 1080p presets, landscape or vertical).
Set duration to 5 / 10 / 15 seconds.
Enable prompt_expansion and multishots if you want richer, multi-shot storytelling.
Set a seed (or leave -1 for variation) and click Run to generate your clip.

💡 Prompt Tips

Start with a clear setting + subject + action: “Cyberpunk city street at night, rain on the ground, a lone biker rides through neon fog, cinematic camera tracking shot.”
For multi-shot stories, hint at structure: “Shot 1: wide city skyline at dawn; Shot 2: hero walks across rooftop; Shot 3: close-up as they put on helmet.”
Keep negative prompts short and focused (e.g. blurry, watermark, extra limbs) instead of full sentences.
Match size to platform: vertical (720×1280 / 1080×1920) for Shorts/Reels/TikTok, landscape for YouTube and web.

More Models to Try

kwaivgi/kling-video-o1/text-to-video Kwaivgi’s cinematic text-to-video model, great for character-driven scenes, smooth camera moves, and short-form storytelling.
/wan-2.5/text-to-video ’s WAN 2.5 prompt-to-video engine, focused on fast, coherent ads, explainers, and product demos.
google/veo3.1/text-to-video Google Veo 3.1 text-to-video, tuned for crisp compositions, filmic motion, and marketing-ready visuals.
openai/sora-2/text-to-video OpenAI Sora 2, a high-end text-to-video generator for long, detailed, physics-aware scenes and premium creative content.

Wan 2.6 Text To Video API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video with your input as JSON. The endpoint returns a prediction id. Start polling the result endpoint around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. On completed, read output values from data.outputs. Examples for Wan 2.6 Text To Video below.

HTTP example

set -euo pipefail

: "${WAVESPEED_API_KEY:?Set WAVESPEED_API_KEY}"

REQUEST_BODY=$(cat <<'JSON'
{
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "size": "1280*720",
    "duration": 5,
    "shot_type": "single",
    "enable_prompt_expansion": false,
    "seed": -1
}
JSON
)

# 1. Submit the prediction.
SUBMIT_RESPONSE=$(curl --silent --show-error --fail-with-body \
  -X POST "https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d "$REQUEST_BODY")

TASK=$(printf '%s' "$SUBMIT_RESPONSE" | jq 'if has("data") then .data else . end')
PREDICTION_ID=$(printf '%s' "$TASK" | jq -r '.id')
if [ -z "$PREDICTION_ID" ] || [ "$PREDICTION_ID" = "null" ]; then
  printf 'Submission response did not contain a prediction id
' >&2
  exit 1
fi
RESULT_URL=$(printf '%s' "$TASK" | jq -r '.urls.get // empty')
if [ -z "$RESULT_URL" ]; then
  RESULT_URL="https://api.wavespeed.ai/api/v3/predictions/$PREDICTION_ID/result"
fi

# 2. Poll until the prediction finishes.
while true; do
  RESPONSE=$(curl --silent --show-error --fail-with-body "$RESULT_URL" \
    -H "Authorization: Bearer $WAVESPEED_API_KEY")
  RESULT=$(printf '%s' "$RESPONSE" | jq 'if has("data") then .data else . end')
  STATUS=$(printf '%s' "$RESULT" | jq -r '.status')
  case "$STATUS" in
    completed) printf '%s\n' "$RESULT" | jq '.outputs'; break ;;
    failed|cancelled|timeout) printf '%s\n' "$RESULT" | jq . >&2; exit 1 ;;
    created|processing) sleep 2 ;;
    *) printf 'Unexpected status: %s
' "$STATUS" >&2; exit 1 ;;
  esac
done

Node.js example

const submitUrl = "https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video";
const apiKey = process.env.WAVESPEED_API_KEY;
if (!apiKey) throw new Error('Set WAVESPEED_API_KEY');

async function requestJson(url, options = {}) {
  const response = await fetch(url, options);
  if (!response.ok) throw new Error(await response.text());
  return response.json();
}

// 1. Submit the prediction.
const body = await requestJson(submitUrl, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "size": "1280*720",
        "duration": 5,
        "shot_type": "single",
        "enable_prompt_expansion": false,
        "seed": -1
}),
});
const task = body.data ?? body;
if (!task.id) throw new Error("Submission response did not contain a prediction id");
const resultUrl = task.urls?.get ||
  `https://api.wavespeed.ai/api/v3/predictions/${task.id}/result`;

// 2. Poll until the prediction finishes.
while (true) {
  const resultBody = await requestJson(resultUrl, {
    headers: { "Authorization": `Bearer ${apiKey}` },
  });
  const result = resultBody.data ?? resultBody;
  if (result.status === "completed") {
    console.log(result.outputs);
    break;
  }
  if (["failed", "cancelled", "timeout"].includes(result.status)) throw new Error(JSON.stringify(result));
  if (!["created", "processing"].includes(result.status)) throw new Error("Unexpected status: " + result.status);
  await new Promise(resolve => setTimeout(resolve, 2000));
}

Python example

import json
import os
import time
from urllib.request import Request, urlopen

api_key = os.environ["WAVESPEED_API_KEY"]
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
payload = {
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "size": "1280*720",
    "duration": 5,
    "shot_type": "single",
    "enable_prompt_expansion": False,
    "seed": -1
}

def request_json(url, data=None):
    request = Request(url, data=data, headers=headers, method="POST" if data else "GET")
    with urlopen(request) as response:
        return json.load(response)

# 1. Submit the prediction.
body = request_json("https://api.wavespeed.ai/api/v3/alibaba/wan-2.6/text-to-video", json.dumps(payload).encode())
task = body.get("data", body)
if not task.get("id"):
    raise RuntimeError("Submission response did not contain a prediction id")
result_url = task.get("urls", {}).get("get") or f"https://api.wavespeed.ai/api/v3/predictions/{task['id']}/result"

# 2. Poll until the prediction finishes.
while True:
    result_body = request_json(result_url)
    result = result_body.get("data", result_body)
    status = result.get("status")
    if status == "completed":
        print(result.get("outputs", []))
        break
    if status in {"failed", "cancelled", "timeout"}:
        raise RuntimeError(result)
    if status not in {"created", "processing"}:
        raise RuntimeError(f"Unexpected status: {status}")
    time.sleep(2)

Wan 2.6 Text To Video API — Frequently asked questions

What is the Wan 2.6 Text To Video API?

Wan 2.6 Text To Video is a Alibaba model for video generation, exposed as a REST API on WaveSpeedAI. WAN 2.6 Text-to-Video turns plain prompts into coherent, cinematic clips with crisp detail, stable motion, and strong instruction-following—great for ads, explainers, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Wan 2.6 Text To Video API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID. Poll the result endpoint starting around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. The playground generates production-oriented Python, JavaScript, and cURL examples with timeouts, transient-error handling, and safe GET retries. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-text-to-video.

How much does Wan 2.6 Text To Video cost per run?

Wan 2.6 Text To Video starts at $0.50 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Wan 2.6 Text To Video accept?

Key inputs: `prompt`, `audio`, `duration`, `size`, `seed`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.6-text-to-video.

How long does Wan 2.6 Text To Video take to generate?

Median end-to-end generation time on WaveSpeedAI is around 56 seconds per request, based on recent successful runs. Queue time varies with global demand; live status is visible in the prediction record.

Can I use Wan 2.6 Text To Video outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Alibaba). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

ExamplesView all

Related Models

README

WAN 2.6 Text-to-Video

🚀 Highlights

🧩 Parameters

💰 Pricing

✅ How to Use

💡 Prompt Tips

More Models to Try

Wan 2.6 Text To Video API — Quick start

Wan 2.6 Text To Video API — Frequently asked questions

Learn More

Legal

Resources

Models

Tools