Wan 2.6: video AI 15 detik dengan koherensi sinematik dan sinkronisasi bibir yang sempurna.

Model video generasi berikutnya dari Alibaba — prompt yang lebih cerdas, sinkronisasi audio yang ditingkatkan, dan konsistensi karakter yang tak tertandingi.

Coba Sekarang

Text to Video
Image to Video
Reference to Video
Prompt
Buat

Fitur Utama

Generasi Naratif Multi-Shot

Sebagian besar model video open-source menghasilkan klip kontinyu tunggal, seringkali kurang struktur atau konsistensi. WAN 2.6 memperkenalkan terobosan besar dengan kemampuannya untuk menghasilkan naratif multi-shot langsung dari prompt sederhana.

Mulai Sekarang

Prompt

The scene unfolds in first-person POV inside a bright, refined modern kitchen. Natural daylight pours across walnut flooring and matte gray cabinetry, giving the space a calm and polished atmosphere. The viewer takes three to four slow, steady steps forward while holding an empty celadon-green porcelain bowl with both black-gloved hands. Ahead stands a built-in double-door refrigerator. The left door features a softly glowing dispenser slot, with faint vapor curling from its edges. When the viewer reaches the refrigerator and lifts the bowl beneath the outlet, a gentle mechanical hum begins. From the small dispenser opening, the plating sequence unfolds with precise, almost ritualistic elegance. First, a smooth stream of deep orange lobster bisque flows into the bowl, circling and rippling as it settles. Moments later, tender pieces of lobster claw and tail meat descend into the center, their pink-red surfaces glistening in the hot broth. A thin ribbon of cream follows, tracing a delicate spiral across the bisque. Finally, micro herbs and tiny gold flakes drift down, completing the dish with a soft visual flourish. The celadon glaze of the bowl reflects the bright natural light, while the warm tones of the bisque shimmer gently on the surface. Subtle sounds fill the space: soft footsteps on the wooden floor, the quiet friction of gloves against the bowl, the rising hum of the refrigerator, the thick pour of bisque hitting the ceramic, the gentle plop of lobster pieces, the light drizzle of cream, and the faint sprinkle of herbs and flakes. Altogether, the moment blends mechanical precision with the warmth and intimacy of fine dining, presented through the calm rhythm of first-person ASMR realism.

Final outcome

Generasi Video Berbasis Referensi

WAN 2.6 mendukung generasi referensi video, memungkinkan pengguna untuk memandu model dengan video input.

Mulai Sekarang

Prompt

character1 is eating dinner with character2 in a restaurant

Final outcome

Generasi Video Panjang 15 Detik

Banyak model open-source terbatas pada produksi video yang sangat pendek, biasanya hanya 2–5 detik, membatasi kedalaman naratif. WAN 2.6 memecahkan hambatan ini dengan mendukung video hingga 15 detik.

Mulai Sekarang

Prompt

Generate an approximately 15-second cohesive narrative video. Story: A medieval knight awakens on a storm-swept meadow after a fierce battle. First 5 seconds: A slow circling shot reveals his mud-covered armor, scattered debris, and lingering flashes of lightning in the dark sky. Middle 5 seconds: The knight rises, grasping a sword embedded in the ground. The camera pulls upward from a low angle, emphasizing the determination in his eyes. Final 5 seconds: He begins running toward a distant ruined castle wall as the camera follows in a handheld-style tracking motion, tall grass brushing past the lens to create dynamic depth of field. Maintain scene continuity, natural body motion, and cinematic epic atmosphere throughout.

Final outcome

Articles about Wan 2.6

Q & A

Format input apa yang Anda dukung?
Format video umum (misalnya, MP4/MOV) didukung. Untuk hasil terbaik, gunakan subjek yang jelas, menghadap ke depan, dengan pencahayaan yang stabil.
Apakah itu mempertahankan identitas dan latar belakang?
Ini memprioritaskan konsistensi identitas dan koherensi adegan, sambil menerapkan gerakan wajah dan bibir yang diminta.
Bisakah saya mengontrol emosi dan gaya berbicara?
Ya. Anda dapat memandu intensitas (tenang/netral/energik), tempo, dan kekuatan ekspresi melalui prompt dan/atau audio referensi.
Bisakah itu menangani beberapa orang/wajah dalam satu frame?
Ini bekerja paling baik ketika subjek yang berbicara jelas dan konsisten terlihat. Perhatian: adegan yang ramai atau oklusi yang sering dapat menyebabkan drift—pertimbangkan untuk memotong atau fokus pada wajah target.