Browse ModelsWavespeed AIAI Talking Photos

Ai Talking Photos

Ai Talking Photos

Playground

Try it on WavespeedAI!

AI Talking Photos brings your photos to life — upload a portrait and text, and watch the person speak. Supports 5-15 seconds duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.

Features

AI Talking Photos

AI Talking Photos makes any portrait speak. Upload a photo, type what you want the person to say, and AI generates a realistic talking video with accurate lip-sync — no filming, no voiceover recording required.


Why Choose This?

  • Realistic lip-sync generation AI maps the text to natural lip movements and facial expressions for believable, human-quality talking video.

  • Any portrait, any text Works on photos of real people, illustrations, historical figures, or fictional characters — if there’s a face, it can talk.

  • Adjustable duration Generate clips from 5 to 15 seconds to match your content length.

  • Reproducible results Use the seed parameter to lock in a specific output for consistent iterations.


Parameters

ParameterRequiredDescription
imageYesPortrait photo to animate (URL or file upload).
textYesThe text you want the person to speak.
durationNoVideo length in seconds. Range: 5–15. Default: 5.
seedNoRandom seed for reproducible results. Use -1 for a random seed.

How to Use

  1. Upload a portrait — a clear, front-facing photo with a visible mouth works best.
  2. Enter your text — type what you want the person to say.
  3. Set duration — choose between 5 and 15 seconds based on your text length.
  4. Set seed (optional) — fix the seed to reproduce a specific result in future runs.
  5. Submit — generate, preview, and download your talking video.

Pricing

DurationCost
5s$0.30
10s$0.60
15s$0.90

Billing Rules

  • Rate: $0.06 per second
  • Duration range: 5–15 seconds

Best Use Cases

  • Social media content — Create engaging talking-head videos from photos without any filming.
  • Marketing & advertising — Generate spokesperson or product explainer videos from still images.
  • Education — Bring historical figures, book characters, or concept illustrations to life.
  • Entertainment — Make friends’ or celebrities’ photos deliver a custom message for fun.

Pro Tips

  • Clear, well-lit front-facing portraits with a fully visible mouth produce the most accurate lip-sync.
  • Match your text length to your chosen duration — roughly 2–3 words per second for natural pacing.
  • Fix the seed when iterating on text variations to keep the facial performance consistent.

Notes

  • Both image and text are required fields.
  • Duration range: 5–15 seconds.
  • Ensure image URLs are publicly accessible if using a link rather than a direct upload.
  • Please ensure your content complies with WaveSpeed AI’s usage policies.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/ai-talking-photos" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "duration": 5,
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
imagestringYes-The URL of the input image.
textstringYes--The text for the photo to speak.
durationintegerNo55 ~ 15The duration of the generated video in seconds.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsobjectArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.