Browse ModelsKwaivgiKwaivgi Kling Lipsync Text To Video

Kwaivgi Kling Lipsync Text To Video

Kwaivgi Kling Lipsync Text To Video

Playground

Try it on WavespeedAI!

Kling TextToVideo by Kwaivgi creates videos with lifelike lip movements that precisely sync to input text for natural speaking visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Kling Lipsync Text-to-Video

Make any face speak your words with AI-powered lip synchronization. Upload a video, enter your text, choose a voice, and Kling Lipsync will generate realistic lip movements perfectly matched to the synthesized speech — ideal for dubbing, content localization, and creative projects.

Why It Looks Great

  • Realistic lip sync: AI-generated mouth movements accurately match the spoken audio for natural-looking results.
  • Multiple voice options: Choose from a variety of voice characters to match your content style.
  • Bilingual support: Generate speech in English (en) or Chinese (zh).
  • Adjustable speed: Control the speaking pace with the voice speed parameter.
  • Text-driven workflow: Simply type what you want the character to say — no audio recording needed.

Parameters

ParameterRequiredDescription
videoYesSource video with a visible face (upload or public URL).
textYesThe text you want the character to speak.
voice_idYesVoice character selection (e.g., genshin_klee2).
voice_languageNoLanguage for speech synthesis: en (English) or zh (Chinese). Default: en.
voice_speedNoSpeaking speed multiplier. Default: 1.

How to Use

  1. Upload your video — drag and drop or paste a public URL. Ensure the face is clearly visible.
  2. Enter your text — type the words you want the character to speak.
  3. Select voice_id — choose a voice character that fits your content.
  4. Choose language — select en for English or zh for Chinese.
  5. Adjust speed (optional) — modify voice_speed to speak faster or slower.
  6. Run — click the button to generate.
  7. Download — preview and save your lip-synced video.

Pricing

Flat rate per generation.

OutputCost
Per video$0.14

Best Use Cases

  • Content Localization — Dub videos into different languages while maintaining natural lip movements.
  • Social Media & Entertainment — Create fun talking videos, memes, and viral content.
  • E-learning & Training — Generate instructional videos with consistent narration.
  • Marketing & Advertising — Produce multilingual ad variants from a single video shoot.
  • Character Animation — Bring static or animated characters to life with synchronized speech.

Pro Tips for Best Results

  • Use videos with clear, front-facing shots of the face for the most accurate lip sync.
  • Keep text length appropriate for the video duration — shorter clips work best with concise messages.
  • Match the voice character to the visual appearance for more believable results.
  • Test different voice_speed values to find the natural pacing for your content.
  • For multilingual projects, generate separate versions with appropriate voice_language settings.
  • Ensure good lighting on the face in the source video for cleaner lip tracking.

Notes

  • If using a URL for the video, ensure it is publicly accessible. A preview thumbnail confirms successful loading.
  • The face must be clearly visible throughout the video for accurate lip synchronization.
  • Processing time may vary based on video length and current queue load.
  • Best results are achieved with videos where the subject is speaking or has a neutral expression.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-lipsync/text-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "voice_id": "genshin_klee2",
    "voice_language": "en",
    "voice_speed": 1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
videostringYes-The URL of the video file for generating synchronized lip movements. Video files support .mp4/.mov, file size does not exceed 100MB, video length does not exceed 10s and is not shorter than 2s, only 720p and 1080p are supported, length and width dimensions should both be between 720px and 1920px.
textstringYes--Text Content for Lip-Sync Video Generation. Max 120 characters.
voice_idstringYesgenshin_klee2genshin_vindi2, zhinen_xuesheng, AOT, ai_shatang, genshin_klee2, genshin_kirara, ai_kaiya, oversea_male1, ai_chenjiahao_712, girlfriend_4_speech02, chat1_female_new-3, chat_0407_5-1, cartoon-boy-07, uk_boy1, cartoon-girl-01, PeppaPig_platform, ai_huangzhong_712, ai_huangyaoshi_712, ai_laoguowang_712, chengshu_jiejie, you_pingjing, calm_story1, uk_man2, laopopo_speech02, heainainai_speech02Voice ID to use for speech synthesis
voice_languagestringNoenzh, enThe voice language corresponding to the Voice ID
voice_speednumberNo10.8 ~ 2.0Speech rate for Text to Video generation

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.