Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only

Seedance 2.0 Models

Seedance 2.0 Models unify text-, image-, video-, and audio-driven generation with native audio sync, multi-shot storyboarding, and cinematic 2K quality

Seedance 2.0 Models unify text-, image-, video-, and audio-driven generation with native audio sync, multi-shot storyboarding, and cinematic 2K quality

All Models

4 models
text-to-video

bytedance/seedance-2.0/text-to-video

Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on ByteDance Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.

image-to-video

bytedance/seedance-2.0/image-to-video

Seedance 2.0 (Image-to-Video) generates Hollywood-grade cinematic videos from reference images and text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on ByteDance Seed's unified multimodal architecture, it preserves the input image's subject and composition while adding expressive, physically accurate motion.

text-to-video

bytedance/seedance-2.0-fast/text-to-video

Seedance 2.0 Fast (Text-to-Video) generates cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability — optimized for faster generation at lower cost. Built on ByteDance Seed's unified multimodal architecture.

image-to-video

bytedance/seedance-2.0-fast/image-to-video

Seedance 2.0 Fast (Image-to-Video) generates cinematic videos from reference images and text prompts with native audio-visual synchronization, director-level control, and exceptional motion stability — optimized for faster generation at lower cost. Built on ByteDance Seed's unified multimodal architecture.

Seedance 2.0 Models

ByteDance's Seedance 2.0 is a production-ready suite of AI video-generation endpoints, featuring native audio-video co-generation, multi-shot storyboarding, and cinematic 2K quality. Seedance 2.0 covers two core workflows: text-to-video generation and image-to-video animation, each available in standard and fast tiers, offering flexible quality-speed trade-offs.

Seedance 2.0 Series — Text-to-Video & Image-to-Video API

Seedance 2.0 offers four focused endpoints for generating videos from text prompts or animating still images—ideal for cinematic content production, social media automation, and repeatable video workflows.

  1. Seedance 2.0 Text-to-Video — Generate high-quality cinematic videos from text prompts with native audio sync, realistic physics, and multi-shot scene transitions.
  2. Seedance 2.0 Image-to-Video — Animate any still image into a fluid video clip with consistent character preservation, natural motion, and synchronized audio output.
  3. Seedance 2.0 Fast Text-to-Video — Fast tier text-to-video generation optimized for speed and rapid iteration without sacrificing core motion quality.
  4. Seedance 2.0 Fast Image-to-Video — Fast tier image animation for high-throughput pipelines requiring quick turnaround on visual content.

Key Features

  1. Native Audio-Video Co-Generation — Video and audio are generated simultaneously in a single pass, delivering lip-synced dialogue, contextual sound effects, and adaptive music without post-production stitching.
  2. Multi-Shot Storyboarding — Generate up to 15-second clips composed of multiple natural shots with seamless cuts and transitions, producing edited-sequence output from a single prompt.
  3. Character Consistency — Facial features, clothing, and visual style are preserved frame-to-frame and across multiple generated clips using reference-based identity locking.
  4. Standard & Fast Tiers — Choose cinematic-quality standard or speed-optimized fast endpoints based on your latency and throughput requirements.
  5. Cinematic Camera Control — Director-level camera controls including push-in, pan, orbit, and tracking shots via natural language prompt keywords.