Nano Banana 2Nano Banana 2 is live
WaveSpeed.ai
Home/Explore/Avatar Lipsync Models/wavespeed-ai/soulx-flashhead
digital-human

digital-human

SoulX FlashHead

wavespeed-ai/soulx-flashhead

SoulX FlashHead enables real-time streaming talking head video generation from portrait image and audio with ultra-fast 96 FPS performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Hint: You can drag and drop a file or click to upload

preview

Hint: You can drag and drop a file or click to upload

Idle

Your request will cost $0.075 per run.

For $1 you can run this model approximately 13 times.

ExamplesView all

README

SoulX FlashHead

SoulX FlashHead generates realistic talking head videos by combining a portrait image with audio. Upload a face image and audio clip — the model creates a lip-synced video with natural facial movements and expressions. With support for audio clips up to 30 minutes and budget-friendly pricing, it's ideal for long-form talking avatar content.

Why Choose This?

  • Long-form support Generate talking avatar videos with audio up to 30 minutes in length.

  • Realistic lip-sync Accurate lip movements synchronized to the audio input.

  • Natural expressions Creates realistic facial movements and expressions during speech.

  • Budget-friendly Lower cost per second compared to other talking avatar models.

  • Resolution options Choose between 480p and 720p output quality.

Parameters

ParameterRequiredDescription
imageYesPortrait image for the avatar (URL or upload)
audioYesAudio clip for lip-sync (URL or upload, max: 30 min)
resolutionNoOutput resolution: 480p, 720p (default)
seedNoRandom seed for reproducibility (-1 for random)

How to Use

  1. Upload your image — provide a clear portrait image with visible face.
  2. Upload your audio — add the audio clip you want the avatar to speak (up to 30 minutes).
  3. Select resolution — choose 720p for higher quality or 480p for faster/cheaper processing.
  4. Run — submit and download your talking avatar video.

Pricing

Duration720p480p
≤5 s$0.15$0.075
10 s$0.30$0.15
30 s$0.90$0.45
60 s$1.80$0.90
5 min$9.00$4.50

Billing Rules

  • Minimum charge: 5 seconds
  • Maximum duration: 1800 seconds (30 minutes)
  • 480p rate: $0.075 per 5 seconds
  • 720p rate: $0.15 per 5 seconds (2× 480p rate)

Best Use Cases

  • Long-Form Content — Create extended talking avatar videos for podcasts, lectures, and presentations.
  • E-learning — Generate instructor avatars for educational courses and tutorials.
  • Digital Presenters — Produce talking avatars for video presentations and explainers.
  • Marketing & Ads — Create spokesperson videos without filming.
  • Localization — Generate lip-synced videos in different languages from the same avatar.

Pro Tips

  • Use clear, front-facing portrait images with good lighting for best results.
  • Ensure the face is clearly visible and not obscured by hair or accessories.
  • Use high-quality audio with clear speech for accurate lip-sync.
  • Choose 480p for longer content to reduce costs — it still provides good quality.
  • Set a specific seed for reproducible results across multiple generations.

Notes

  • Both image and audio are required fields.
  • Maximum audio duration: 30 minutes (1800 seconds).
  • Minimum charge: 5 seconds (shorter clips billed as 5 seconds).
  • 720p resolution costs 2× the 480p rate.
  • Ensure uploaded file URLs are publicly accessible.

Related Models