Nano Banana 2Nano Banana 2 is live
WaveSpeed.ai
Home/Explore/Avatar Lipsync Models/wavespeed-ai/skyreels-v3-talking-avatar
digital-human

digital-human

SkyReels V3 Talking Avatar

wavespeed-ai/skyreels-v3-talking-avatar

SkyReels V3 Talking Avatar is a 19B-parameter multimodal model that generates talking avatars from portrait and audio with precise lip sync, supporting up to 200 seconds at 720p resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

preview

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.15 per run.

For $10 you can run this model approximately 66 times.

ExamplesView all

README

SkyReels V3 Talking Avatar

SkyReels V3 Talking Avatar generates realistic talking head videos by combining a portrait image with audio. Upload a face image and audio clip — the model creates a lip-synced video with natural facial movements and expressions that match the speech.

Why Choose This?

  • Realistic lip-sync Generates accurate lip movements synchronized to the audio input.

  • Natural expressions Creates realistic facial movements and expressions during speech.

  • Flexible duration Support for audio clips up to 15 seconds.

  • Resolution options Choose between standard and 720p output quality.

  • Simple workflow Just upload an image and audio — no complex setup required.

  • Prompt Enhancer Optional prompt to guide the generation style.

Parameters

ParameterRequiredDescription
imageYesPortrait image for the avatar (URL or upload)
audioYesAudio clip for lip-sync (URL or upload, max: 15s)
promptNoOptional text to guide generation style
resolutionNoOutput resolution: 720p (default) or lower
seedNoRandom seed for reproducibility (-1 for random)

How to Use

  1. Upload your image — provide a clear portrait image with visible face.
  2. Upload your audio — add the audio clip you want the avatar to speak (up to 15 seconds).
  3. Add prompt (optional) — describe any specific style or expression preferences.
  4. Select resolution — choose 720p for higher quality or lower for faster processing.
  5. Run — submit and download your talking avatar video.

Pricing

Duration720pStandard
≤5 s$0.30$0.15
10 s$0.60$0.30
15 s$0.90$0.45

Billing Rules

  • Minimum charge: 5 seconds
  • Maximum duration: 15 seconds
  • 720p rate: 2× standard rate
  • Standard rate: $0.15 per 5 seconds

Best Use Cases

  • Digital Presenters — Create talking avatars for video presentations and tutorials.
  • Social Media Content — Generate engaging talking head videos for TikTok, Reels, and Shorts.
  • Marketing & Ads — Produce spokesperson videos without filming.
  • E-learning — Create instructor avatars for educational content.
  • Localization — Generate lip-synced videos in different languages from the same avatar.

Pro Tips

  • Use clear, front-facing portrait images with good lighting for best results.
  • Ensure the face is clearly visible and not obscured by hair or accessories.
  • Use high-quality audio with clear speech for accurate lip-sync.
  • Keep audio clips within 15 seconds for optimal results.
  • Set a specific seed for reproducible results across multiple generations.

Notes

  • Both image and audio are required fields.
  • Maximum audio duration: 15 seconds.
  • Minimum charge: 5 seconds (shorter clips billed as 5 seconds).
  • 720p resolution costs 2× the standard rate.
  • Ensure uploaded file URLs are publicly accessible.

Related Models