Wan 2.5 Text to Video | Powerful Text-to-Video API

Inicio/Explorar/Alibaba/Wan 2.5/Text To Video

alibaba /

WAN 2.5 makes 480p-1080p text/image-to-video with synced audio and is faster, more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

Entrada

Enable Safety Checker

Inactivo

$0.25por ejecución·~40 / $10

EjemplosVer todo

A vibrant young woman in her early 20s runs toward the camera in Times Square at night, ecstatic and wide-eyed, shouting passionately into a black microphone. She wears a neon green windbreaker and black headphones around her neck. She yells: “Yo, Wan2.5 just dropped on WaveSpeedAI — sound and texture are next level, try it right now!” Wet reflective streets, glowing blue-white-magenta billboards, blurred pedestrians, dynamic handheld follow-shot, sharp face focus, shallow depth of field. 4K UHD, saturated colors, viral UGC style.

Cinematic shot, close-up, a rainy night in a cyberpunk city, neon lights reflect dazzling spots on the wet streets. A detective in a trench coat leans against an alley wall, rain dripping from the brim of his hat as he exhales a weary breath of white vapor. The camera slowly pushes in, focusing on his determined gaze. Sound: Continuous heavy rain, the distinct sounds of raindrops hitting metal and pavement, distant sirens fading in and out, the faint electrical "buzz" from neon signs, the protagonist's heavy breathing, a suspenseful synthwave track as background music.

National Geographic style, ultra-wide-angle shot, at sunrise, golden light pierces through the morning mist, illuminating a tranquil, ancient forest. A sika deer cautiously approaches a crystal-clear stream to drink. The camera pans slowly from a low angle, showcasing the vastness and vitality of the forest. Sound: Crisp birdsong, the gentle babbling of the stream, the rustling of wind through leaves, the subtle sounds of the deer drinking and swallowing water, a few distant deer calls, with an ethereal and soothing instrumental score in the background.

Studio Ghibli anime style, a bustling ancient Chinese market, streets are crowded with people, vendors are shouting their wares, and children are chasing each other playfully. The background features traditional architecture and waving banners. The camera moves through the crowd in a first-person perspective. Sound: A cacophony of human voices, including vendor calls, customer haggling, and children's laughter. In the background, there are sounds of gongs, distant opera music, and the general din of footsteps and objects. The background music is a lively and festive traditional Chinese folk tune.

A realistic bar fille with cognac selection with the man image attached in a sophisticated bartending uniform holding a louis viii bottle and higlighting the looks of the beautiful bottle while the video looks like filming around the realistic scene in a 90 second video coverage

Dynamic full body shot, a stylish anime girl with neon pink hair and glowing cybernetic eyes, performing an energetic K-pop dance on a futuristic Tokyo stage, surrounded by holographic displays and dazzling lens flares, vibrant neon color palette, detailed anime art style by GUWEIZ, motion lines, perfect composition.

Heavy armored Gun dam in Black and Gold wielding a blue laser sword and assault rifle, photo realistic, in space with planets in the background, dynamic lighting, epic pose, unreal engine.

Photorealistic image of a Mercedes-Benz G-Class in a dense jungle, surrounded by lush green foliage. The vehicle is parked on a muddy trail, its sleek, rugged design contrasting with the natural surroundings. The jungle is rich with tropical plants, vines, and trees, with dappled sunlight filtering through the thick canopy above. The lighting is cinematic, with dramatic shadows and highlights emphasizing the car's details, such as the glossy paint and the textures of the tires and metal. The scene is hyper-realistic, with intricate details in the jungle environment from the misty air to the textures of the leaves and bark. The overall atmosphere feels adventurous and dynamic, showcasing the power and elegance of the Mercedes-Benz in this wild, untamed setting.

A beautiful woman in camouflage military attire with long flowing hair and a warm smile. She stands up gracefully, keeping her smile, and begins walking to the side. As she walks, she glances back over her shoulder with a confident, relaxed expression. The setting transitions into the wide concrete runway of a military base, with hangars and faint silhouettes of aircraft in the background. The camera tracks smoothly with her movement, cinematic style, natural lighting, professional film look, realistic motion.

A 3D animated, anthropomorphic badger wearing a brown leather vest is angrily sweeping yellow autumn leaves from the doorway of his rustic wooden cabin. The style is reminiscent of a Pixar film, with detailed fur and expressive animation. Sunny day, lush green meadow with a forest in the background.

A low-angle panning shot of a concrete wall under a highway overpass at night. Graffiti of a young man comes to life and starts rapping. The style is a dynamic blend of 2D street art animation on a realistic, dark, cinematic background. Cityscape is visible in the distance.

A middle-aged man sitting at a wooden desk in a cozy study room, surrounded by bookshelves and a warm lamp glow. He opens an old book and reads aloud with a calm, deep voice: 'History teaches us more than just facts… it shows us who we are.' The room has subtle background sounds: pages turning, the faint ticking of a clock, and distant rain against the window.

A young man in his early 30s sits in a modern studio, wearing a navy blazer and white shirt. Soft lighting illuminates his face. He speaks directly to the camera, his lips moving naturally as he says: “Welcome to today’s interview. We’re going to explore how AI is changing our daily lives.” His gestures are subtle, occasionally raising his hands for emphasis, creating a professional and engaging tone.

A cinematic opening sequence of a sci-fi movie: a spaceship travels across the galaxy, and the movie title “星河远征 · Galactic Odyssey” emerges in golden 3D letters, with flawless kerning and no distortion, floating stably in space as the camera rotates.

A handsome, muscular man with well-defined abs is catching his breath after an intense workout. Sweat drips down his torso. He is shirtless, wearing only black athletic shorts, and is leaning against gym equipment. The lighting comes from the upper side, highlighting the contours of his chest and arms. The scene is filled with a raw, masculine energy, hyper-realistic, high-contrast lighting.

A graceful ballerina with her hair in a messy bun, performing a powerful and emotional contemporary ballet routine. She is in a minimalist, dark art studio. Abstract patterns of light and shadow, projected from a hidden source, dance across her body and the surrounding walls, constantly shifting with her movements. The camera focuses on the tension in her muscles and the expressive gestures of her hands. A single, dramatic slow-motion shot captures her mid-air leap, with the light patterns swirling around her like a galaxy. Moody, artistic, high contrast.

A young couple sitting on a park bench during sunset. The woman leans her head on the man’s shoulder. He whispers softly: 'No matter where we go, I’ll always be here with you.' The sound includes the rustling of leaves, distant laughter of children playing, and the gentle hum of cicadas in the evening air.

Modelos relacionados

wan-2.5/image-to-video

image-to-video

wan-2.5/video-extend

video-extend

wan-2.5/text-to-image

text-to-image

wan-2.5/image-edit

image-to-image

pixverse-c1/text-to-video

text-to-video

seedance-2.0/text-to-video

text-to-video

README

WAN 2.5 Text-to-Video Model

WAN 2.5 is an advanced text-to-video model provided by Cloud's DashScope platform. This model generates high-quality 480p/720p/1080p videos from text prompts.

What makes it stand out?

More affordable: Wan 2.5 is more streamlined and cost-effective - reducing creator expenses and offering more options.
One-pass A/V sync: Wan 2.5 creates a fully synchronized video (audio/voiceover + lip-sync) from a single, well-structured prompt - no separate recording or manual alignment required.
Multilingual friendly: Wan 2.5 reliably processes like Chinese prompts for A/V-synced videos.
Longer duration & more video size options: Wan 2.5 delivers up to 10 seconds and 6 aspect/size options, enabling more storytelling room and publishing flexibility.
Custom Voice: Add your own audio or let the model generate one for you. Plug-and-play, easy to swap!

Designed For

Marketing teams: Fast, polished demos/tutorials—low cost, consistent style.
Global enterprises: Multilingual, lip-synced videos with subtitles for efficient localization.
Storytellers & YouTubers: Immersive narratives while maintaining cadence and quality—driving growth.
Corporate training teams: HD videos over docs—clearer key points, better communication.

Pricing

Resolution	Price per second
480p	$0.05
720p	$0.10
1080p	$0.15

How to Use

Write your prompt.
Upload an audio file (optional) for voice/music.
Choose the video size (resolution/aspect).
Select the video duration (e.g., 5s / 10s).
Submit and wait for processing.
Preview and download the result.

Note

Audio limits

Formats: wav, mp3
Length: 3–30 seconds
File size: ≤ 15 MB

Over-limit handling

If the audio exceeds the target duration (5s or 10s), the model keeps only the first 5s/10s; the rest is discarded.
If the audio is shorter than the video duration, the extra video part is silent.

Accesibilidad:Este sitio web utiliza modelos de IA proporcionados por terceros.

EjemplosVer todo

Modelos relacionados

README

WAN 2.5 Text-to-Video Model

What makes it stand out?

Designed For

Pricing

How to Use

Note

Wan 2.5 Text To Video API — Quick start

Wan 2.5 Text To Video API — Frequently asked questions