Home/Explore/Avatar Lipsync & Digital Human/wavespeed-ai/steady-dancer
video-to-video

video-to-video

SteadyDancer

wavespeed-ai/steady-dancer

SteadyDancer is a 14B-parameter human image animation framework that transforms static images into coherent dance videos. Features first-frame preservation, robust identity consistency, and temporal coherence for realistic motion generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

preview

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.15 per run.

For $10 you can run this model approximately 66 times.

One more thing::

ExamplesView all

README

wavespeed-ai/steady-dancer — Image-to-Video Motion Transfer

Steady Dancer is WaveSpeedAI’s motion-transfer model: you upload a character image and a driving video, and it generates a new clip where your character follows the motion from the video while keeping a stable face, outfit, and overall identity. Ideal for dance edits, cosplay previews, and social short-form content.

What is SteadyDancer?

SteadyDancer is a 14-billion parameter human image animation framework that converts static images into coherent dance motion videos. Built on diffusion models, it uses an Image-to-Video paradigm with key innovations for high-quality animation.

✨ Highlights

  • Image-driven identity – Uses your uploaded image as the main reference for face, outfit, and body shape.
  • Video-driven motion – Copies camera movement and body motion from the driving video.
  • Stability-focused – Designed to keep faces, limbs, and outfit details consistent across frames.
  • Resolution choices – Output at 480p for quick previews or 720p for higher-quality clips.
  • Prompt-guided style (optional) – Add a short text prompt to nudge colour, atmosphere, or style, or leave blank for neutral transfer.

🧩 Parameters

  • image* – Required. The character / subject image to insert into the motion.
  • video* – Required. Driving video whose motion and camera you want to reuse.
  • prompt – Optional text description for style / mood (e.g. “cinematic lighting, soft film grain, vivid colours”).
  • resolution – Output resolution: 480p or 720p.
  • seed-1 for random; any other integer for reproducible results.

💰 Pricing

Pricing is based on video length, resolution, and billed in 5-second blocks, with:

  • Minimum billable length: 5 seconds
  • Maximum billable length: 120 seconds (anything longer is charged as 120 s)
  • Base price: $0.15 per 5 seconds at 480p

Effective rates:

ResolutionEffective price per second5 s clip10 s clip60 s clip120 s clip (cap)
480p$0.03 / s$0.15$0.30$1.80$3.60
720p$0.06 / s (×2)$0.30$0.60$3.60$7.20

Internally, the system:

  • Takes your video duration (capped at 120 s),
  • Rounds it into 5-second blocks,
  • Multiplies by the base price, and
  • Applies a ×2 multiplier for 720p.

🚀 How to Use

  1. Upload image – choose the face / character you want to animate.
  2. Upload video – select the motion source clip.
  3. (Optional) Enter a prompt to guide overall look and mood.
  4. Choose resolution (start with 480p for fast tests; switch to 720p for final export).
  5. (Optional) Set a fixed seed if you want to reproduce or slightly tweak the same take later.
  6. Click Run and download the generated video once completed.

🎯 Recommended Use Cases

  • Dance and performance remixes using a static character or avatar.
  • Cosplay or outfit previews based on a single photo.
  • VTuber / virtual idol short clips for social platforms.
  • Quick pre-viz for ad concepts or character motion tests.

💡 Tips & Notes

  • For best results, keep framing similar between the image and driving video (e.g. both full-body or both mid-shot).
  • Avoid extremely fast motion, strong occlusions, or very busy backgrounds in the driving video for first tests.
  • If faces look unstable, try a clearer input image or reduce extreme camera shake in the driving clip.

Reference

Try other models and see the difference

  • fun-control — A playful motion-remix model built on Alibaba’s Wan 2.2, for controllable character and camera movement from simple prompts.
  • wan-animate — A general animation model powered by Alibaba’s Wan 2.2, turning text or images into smooth, high-quality short videos.