Ai Talking Photos

Playground

AI Talking Photos brings your photos to life — upload a portrait and text, and watch the person speak. Supports 5-15 seconds duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.

Features

AI Talking Photos

AI Talking Photos makes any portrait speak. Upload a photo, type what you want the person to say, and AI generates a realistic talking video with accurate lip-sync — no filming, no voiceover recording required.

Why Choose This?

Realistic lip-sync generation AI maps the text to natural lip movements and facial expressions for believable, human-quality talking video.
Any portrait, any text Works on photos of real people, illustrations, historical figures, or fictional characters — if there’s a face, it can talk.
Adjustable duration Generate clips from 5 to 15 seconds to match your content length.
Reproducible results Use the seed parameter to lock in a specific output for consistent iterations.

Parameters

Parameter	Required	Description
image	Yes	Portrait photo to animate (URL or file upload).
text	Yes	The text you want the person to speak.
duration	No	Video length in seconds. Range: 5–15. Default: 5.
seed	No	Random seed for reproducible results. Use -1 for a random seed.

How to Use

Upload a portrait — a clear, front-facing photo with a visible mouth works best.
Enter your text — type what you want the person to say.
Set duration — choose between 5 and 15 seconds based on your text length.
Set seed (optional) — fix the seed to reproduce a specific result in future runs.
Submit — generate, preview, and download your talking video.

Pricing

Duration	Cost
5s	$0.30
10s	$0.60
15s	$0.90

Billing Rules

Rate: $0.06 per second
Duration range: 5–15 seconds

Best Use Cases

Social media content — Create engaging talking-head videos from photos without any filming.
Marketing & advertising — Generate spokesperson or product explainer videos from still images.
Education — Bring historical figures, book characters, or concept illustrations to life.
Entertainment — Make friends’ or celebrities’ photos deliver a custom message for fun.

Pro Tips

Clear, well-lit front-facing portraits with a fully visible mouth produce the most accurate lip-sync.
Match your text length to your chosen duration — roughly 2–3 words per second for natural pacing.
Fix the seed when iterating on text variations to keep the facial performance consistent.

Notes

Both image and text are required fields.
Duration range: 5–15 seconds.
Ensure image URLs are publicly accessible if using a link rather than a direct upload.
Please ensure your content complies with WaveSpeed AI’s usage policies.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/ai-talking-photos" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "duration": 5,
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
image	string	Yes		-	The URL of the input image.
text	string	Yes	-	-	The text for the photo to speak.
duration	integer	No	5	5 ~ 15	The duration of the generated video in seconds.
seed	integer	No	-1	-1 ~ 2147483647	The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	object	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

AI Story Generator AI Travel Trends