WAN 2.5 makes 480p-1080p text/image-to-video with synced audio and is faster, more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Inactivo
$0.25por ejecución·~40 / $10
A vibrant young woman in her early 20s runs toward the camera in Times Square at night, ecstatic and wide-eyed, shouting passionately into a black microphone. She wears a neon green windbreaker and black headphones around her neck. She yells: “Yo, Wan2.5 just dropped on WaveSpeedAI — sound and texture are next level, try it right now!” Wet reflective streets, glowing blue-white-magenta billboards, blurred pedestrians, dynamic handheld follow-shot, sharp face focus, shallow depth of field. 4K UHD, saturated colors, viral UGC style.
Cinematic shot, close-up, a rainy night in a cyberpunk city, neon lights reflect dazzling spots on the wet streets. A detective in a trench coat leans against an alley wall, rain dripping from the brim of his hat as he exhales a weary breath of white vapor. The camera slowly pushes in, focusing on his determined gaze. Sound: Continuous heavy rain, the distinct sounds of raindrops hitting metal and pavement, distant sirens fading in and out, the faint electrical "buzz" from neon signs, the protagonist's heavy breathing, a suspenseful synthwave track as background music.
National Geographic style, ultra-wide-angle shot, at sunrise, golden light pierces through the morning mist, illuminating a tranquil, ancient forest. A sika deer cautiously approaches a crystal-clear stream to drink. The camera pans slowly from a low angle, showcasing the vastness and vitality of the forest. Sound: Crisp birdsong, the gentle babbling of the stream, the rustling of wind through leaves, the subtle sounds of the deer drinking and swallowing water, a few distant deer calls, with an ethereal and soothing instrumental score in the background.
Studio Ghibli anime style, a bustling ancient Chinese market, streets are crowded with people, vendors are shouting their wares, and children are chasing each other playfully. The background features traditional architecture and waving banners. The camera moves through the crowd in a first-person perspective. Sound: A cacophony of human voices, including vendor calls, customer haggling, and children's laughter. In the background, there are sounds of gongs, distant opera music, and the general din of footsteps and objects. The background music is a lively and festive traditional Chinese folk tune.
A realistic bar fille with cognac selection with the man image attached in a sophisticated bartending uniform holding a louis viii bottle and higlighting the looks of the beautiful bottle while the video looks like filming around the realistic scene in a 90 second video coverage
Dynamic full body shot, a stylish anime girl with neon pink hair and glowing cybernetic eyes, performing an energetic K-pop dance on a futuristic Tokyo stage, surrounded by holographic displays and dazzling lens flares, vibrant neon color palette, detailed anime art style by GUWEIZ, motion lines, perfect composition.
Heavy armored Gun dam in Black and Gold wielding a blue laser sword and assault rifle, photo realistic, in space with planets in the background, dynamic lighting, epic pose, unreal engine.
Photorealistic image of a Mercedes-Benz G-Class in a dense jungle, surrounded by lush green foliage. The vehicle is parked on a muddy trail, its sleek, rugged design contrasting with the natural surroundings. The jungle is rich with tropical plants, vines, and trees, with dappled sunlight filtering through the thick canopy above. The lighting is cinematic, with dramatic shadows and highlights emphasizing the car's details, such as the glossy paint and the textures of the tires and metal. The scene is hyper-realistic, with intricate details in the jungle environment from the misty air to the textures of the leaves and bark. The overall atmosphere feels adventurous and dynamic, showcasing the power and elegance of the Mercedes-Benz in this wild, untamed setting.
A beautiful woman in camouflage military attire with long flowing hair and a warm smile. She stands up gracefully, keeping her smile, and begins walking to the side. As she walks, she glances back over her shoulder with a confident, relaxed expression. The setting transitions into the wide concrete runway of a military base, with hangars and faint silhouettes of aircraft in the background. The camera tracks smoothly with her movement, cinematic style, natural lighting, professional film look, realistic motion.
A 3D animated, anthropomorphic badger wearing a brown leather vest is angrily sweeping yellow autumn leaves from the doorway of his rustic wooden cabin. The style is reminiscent of a Pixar film, with detailed fur and expressive animation. Sunny day, lush green meadow with a forest in the background.
A low-angle panning shot of a concrete wall under a highway overpass at night. Graffiti of a young man comes to life and starts rapping. The style is a dynamic blend of 2D street art animation on a realistic, dark, cinematic background. Cityscape is visible in the distance.
A middle-aged man sitting at a wooden desk in a cozy study room, surrounded by bookshelves and a warm lamp glow. He opens an old book and reads aloud with a calm, deep voice: 'History teaches us more than just facts… it shows us who we are.' The room has subtle background sounds: pages turning, the faint ticking of a clock, and distant rain against the window.
A young man in his early 30s sits in a modern studio, wearing a navy blazer and white shirt. Soft lighting illuminates his face. He speaks directly to the camera, his lips moving naturally as he says: “Welcome to today’s interview. We’re going to explore how AI is changing our daily lives.” His gestures are subtle, occasionally raising his hands for emphasis, creating a professional and engaging tone.
A cinematic opening sequence of a sci-fi movie: a spaceship travels across the galaxy, and the movie title “星河远征 · Galactic Odyssey” emerges in golden 3D letters, with flawless kerning and no distortion, floating stably in space as the camera rotates.
A handsome, muscular man with well-defined abs is catching his breath after an intense workout. Sweat drips down his torso. He is shirtless, wearing only black athletic shorts, and is leaning against gym equipment. The lighting comes from the upper side, highlighting the contours of his chest and arms. The scene is filled with a raw, masculine energy, hyper-realistic, high-contrast lighting.
A graceful ballerina with her hair in a messy bun, performing a powerful and emotional contemporary ballet routine. She is in a minimalist, dark art studio. Abstract patterns of light and shadow, projected from a hidden source, dance across her body and the surrounding walls, constantly shifting with her movements. The camera focuses on the tension in her muscles and the expressive gestures of her hands. A single, dramatic slow-motion shot captures her mid-air leap, with the light patterns swirling around her like a galaxy. Moody, artistic, high contrast.
A young couple sitting on a park bench during sunset. The woman leans her head on the man’s shoulder. He whispers softly: 'No matter where we go, I’ll always be here with you.' The sound includes the rustling of leaves, distant laughter of children playing, and the gentle hum of cicadas in the evening air.
WAN 2.5 is an advanced text-to-video model provided by Cloud's DashScope platform. This model generates high-quality 480p/720p/1080p videos from text prompts.
| Resolution | Price per second |
|---|---|
| 480p | $0.05 |
| 720p | $0.10 |
| 1080p | $0.15 |
Audio limits
Over-limit handling
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/alibaba/wan-2.5/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Wan 2.5 Text To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/alibaba/wan-2.5/text-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"audio": "https://example.com/your-audio.mp3",
"size": "1280*720",
"duration": 5,
"enable_prompt_expansion": false,
"seed": -1
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("alibaba/wan-2.5/text-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"audio": "https://example.com/your-audio.mp3",
"size": "1280*720",
"duration": 5,
"enable_prompt_expansion": false,
"seed": -1
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"alibaba/wan-2.5/text-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"negative_prompt": "blurry, low quality, distorted",
"audio": "https://example.com/your-audio.mp3",
"size": "1280*720",
"duration": 5,
"enable_prompt_expansion": false,
"seed": -1
}
)
print(output["outputs"][0]) # → URL of the generated outputWan 2.5 Text To Video is a Alibaba model for video generation, exposed as a REST API on WaveSpeedAI. WAN 2.5 makes 480p-1080p text/image-to-video with synced audio and is faster, more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.5-text-to-video.
Wan 2.5 Text To Video starts at $0.25 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `audio`, `duration`, `size`, `seed`, `negative_prompt`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/alibaba/alibaba-wan-2.5-text-to-video.
Average end-to-end generation time on WaveSpeedAI is around 102 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Alibaba). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.