Giảm 50% mô hình Vidu Q3 & Q3 Pro · Chỉ trên WaveSpeedAI | 20/5 – 2/6

AI Talking Photos

wavespeed-ai /

AI Talking Photos brings your photos to life — upload a portrait and text, and watch the person speak. Supports 5-15 seconds duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-video
Input

Kéo & thả hoặc nhấp để tải lên

preview

Idle

$0.3per run·~33 / $10

Next:

ExamplesView all

Related Models

README

AI Talking Photos

AI Talking Photos makes any portrait speak. Upload a photo, type what you want the person to say, and AI generates a realistic talking video with accurate lip-sync — no filming, no voiceover recording required.

Why Choose This?

  • Realistic lip-sync generation AI maps the text to natural lip movements and facial expressions for believable, human-quality talking video.

  • Any portrait, any text Works on photos of real people, illustrations, historical figures, or fictional characters — if there's a face, it can talk.

  • Adjustable duration Generate clips from 5 to 15 seconds to match your content length.

  • Reproducible results Use the seed parameter to lock in a specific output for consistent iterations.

Parameters

ParameterRequiredDescription
imageYesPortrait photo to animate (URL or file upload).
textYesThe text you want the person to speak.
durationNoVideo length in seconds. Range: 5–15. Default: 5.
seedNoRandom seed for reproducible results. Use -1 for a random seed.

How to Use

  1. Upload a portrait — a clear, front-facing photo with a visible mouth works best.
  2. Enter your text — type what you want the person to say.
  3. Set duration — choose between 5 and 15 seconds based on your text length.
  4. Set seed (optional) — fix the seed to reproduce a specific result in future runs.
  5. Submit — generate, preview, and download your talking video.

Pricing

DurationCost
5s$0.30
10s$0.60
15s$0.90

Billing Rules

  • Rate: $0.06 per second
  • Duration range: 5–15 seconds

Best Use Cases

  • Social media content — Create engaging talking-head videos from photos without any filming.
  • Marketing & advertising — Generate spokesperson or product explainer videos from still images.
  • Education — Bring historical figures, book characters, or concept illustrations to life.
  • Entertainment — Make friends' or celebrities' photos deliver a custom message for fun.

Pro Tips

  • Clear, well-lit front-facing portraits with a fully visible mouth produce the most accurate lip-sync.
  • Match your text length to your chosen duration — roughly 2–3 words per second for natural pacing.
  • Fix the seed when iterating on text variations to keep the facial performance consistent.

Notes

  • Both image and text are required fields.
  • Duration range: 5–15 seconds.
  • Ensure image URLs are publicly accessible if using a link rather than a direct upload.
  • Please ensure your content complies with WaveSpeed AI's usage policies.
Accessibility:This website uses AI models provided by third parties.

Ai Talking Photos API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/ai-talking-photos with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Ai Talking Photos below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/ai-talking-photos" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "image": "https://example.com/your-input.jpg",
    "duration": 5,
    "seed": -1
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("wavespeed-ai/ai-talking-photos", {
        "image": "https://example.com/your-input.jpg",
        "duration": 5,
        "seed": -1
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "wavespeed-ai/ai-talking-photos",
    {
    "image": "https://example.com/your-input.jpg",
    "duration": 5,
    "seed": -1
}
)

print(output["outputs"][0])  # → URL of the generated output

Ai Talking Photos API — Frequently asked questions

What is the Ai Talking Photos API?

Ai Talking Photos is a WaveSpeedAI model for video generation from images, exposed as a REST API on WaveSpeedAI. AI Talking Photos brings your photos to life — upload a portrait and text, and watch the person speak. Supports 5-15 seconds duration. Ready-to-use REST inference API, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Ai Talking Photos API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/ai-talking-photos.

How much does Ai Talking Photos cost per run?

Ai Talking Photos starts at $0.30 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Ai Talking Photos accept?

Key inputs: `image`, `duration`, `seed`, `text`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/ai-talking-photos.

How do I get started with the Ai Talking Photos API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Ai Talking Photos outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.