Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
首页/探索/Best Video Models/wavespeed-ai/ai-talking-photos
image-to-video

image-to-video

AI Talking Photos

wavespeed-ai/ai-talking-photos

AI Talking Photos brings your photos to life — upload a portrait and text, and watch the person speak. Supports 5-15 seconds duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.

Input

Drag & drop or click to upload

preview

Idle

您的请求将花费 $0.3 每次运行。

使用 $10 您可以运行此模型大约 33 次。

还有一件事:

示例查看全部

README

AI Talking Photos

AI Talking Photos makes any portrait speak. Upload a photo, type what you want the person to say, and AI generates a realistic talking video with accurate lip-sync — no filming, no voiceover recording required.

Why Choose This?

  • Realistic lip-sync generation AI maps the text to natural lip movements and facial expressions for believable, human-quality talking video.

  • Any portrait, any text Works on photos of real people, illustrations, historical figures, or fictional characters — if there's a face, it can talk.

  • Adjustable duration Generate clips from 5 to 15 seconds to match your content length.

  • Reproducible results Use the seed parameter to lock in a specific output for consistent iterations.

Parameters

ParameterRequiredDescription
imageYesPortrait photo to animate (URL or file upload).
textYesThe text you want the person to speak.
durationNoVideo length in seconds. Range: 5–15. Default: 5.
seedNoRandom seed for reproducible results. Use -1 for a random seed.

How to Use

  1. Upload a portrait — a clear, front-facing photo with a visible mouth works best.
  2. Enter your text — type what you want the person to say.
  3. Set duration — choose between 5 and 15 seconds based on your text length.
  4. Set seed (optional) — fix the seed to reproduce a specific result in future runs.
  5. Submit — generate, preview, and download your talking video.

Pricing

DurationCost
5s$0.30
10s$0.60
15s$0.90

Billing Rules

  • Rate: $0.06 per second
  • Duration range: 5–15 seconds

Best Use Cases

  • Social media content — Create engaging talking-head videos from photos without any filming.
  • Marketing & advertising — Generate spokesperson or product explainer videos from still images.
  • Education — Bring historical figures, book characters, or concept illustrations to life.
  • Entertainment — Make friends' or celebrities' photos deliver a custom message for fun.

Pro Tips

  • Clear, well-lit front-facing portraits with a fully visible mouth produce the most accurate lip-sync.
  • Match your text length to your chosen duration — roughly 2–3 words per second for natural pacing.
  • Fix the seed when iterating on text variations to keep the facial performance consistent.

Notes

  • Both image and text are required fields.
  • Duration range: 5–15 seconds.
  • Ensure image URLs are publicly accessible if using a link rather than a direct upload.
  • Please ensure your content complies with WaveSpeed AI's usage policies.