Kwaivgi Kling Lipsync Text To Video

Playground

Kling TextToVideo by Kwaivgi creates videos with lifelike lip movements that precisely sync to input text for natural speaking visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Kling Lipsync Text-to-Video

Make any face speak your words with AI-powered lip synchronization. Upload a video, enter your text, choose a voice, and Kling Lipsync will generate realistic lip movements perfectly matched to the synthesized speech — ideal for dubbing, content localization, and creative projects.

Why It Looks Great

Realistic lip sync: AI-generated mouth movements accurately match the spoken audio for natural-looking results.
Multiple voice options: Choose from a variety of voice characters to match your content style.
Bilingual support: Generate speech in English (en) or Chinese (zh).
Adjustable speed: Control the speaking pace with the voice speed parameter.
Text-driven workflow: Simply type what you want the character to say — no audio recording needed.

Parameters

Parameter	Required	Description
video	Yes	Source video with a visible face (upload or public URL).
text	Yes	The text you want the character to speak.
voice_id	Yes	Voice character selection (e.g., genshin_klee2).
voice_language	No	Language for speech synthesis: en (English) or zh (Chinese). Default: en.
voice_speed	No	Speaking speed multiplier. Default: 1.

How to Use

Upload your video — drag and drop or paste a public URL. Ensure the face is clearly visible.
Enter your text — type the words you want the character to speak.
Select voice_id — choose a voice character that fits your content.
Choose language — select en for English or zh for Chinese.
Adjust speed (optional) — modify voice_speed to speak faster or slower.
Run — click the button to generate.
Download — preview and save your lip-synced video.

Pricing

Flat rate per generation.

Output	Cost
Per video	$0.14

Best Use Cases

Content Localization — Dub videos into different languages while maintaining natural lip movements.
Social Media & Entertainment — Create fun talking videos, memes, and viral content.
E-learning & Training — Generate instructional videos with consistent narration.
Marketing & Advertising — Produce multilingual ad variants from a single video shoot.
Character Animation — Bring static or animated characters to life with synchronized speech.

Pro Tips for Best Results

Use videos with clear, front-facing shots of the face for the most accurate lip sync.
Keep text length appropriate for the video duration — shorter clips work best with concise messages.
Match the voice character to the visual appearance for more believable results.
Test different voice_speed values to find the natural pacing for your content.
For multilingual projects, generate separate versions with appropriate voice_language settings.
Ensure good lighting on the face in the source video for cleaner lip tracking.

Notes

If using a URL for the video, ensure it is publicly accessible. A preview thumbnail confirms successful loading.
The face must be clearly visible throughout the video for accurate lip synchronization.
Processing time may vary based on video length and current queue load.
Best results are achieved with videos where the subject is speaking or has a neutral expression.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-lipsync/text-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "voice_id": "genshin_klee2",
    "voice_language": "en",
    "voice_speed": 1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
video	string	Yes		-	The URL of the video file for generating synchronized lip movements. Video files support .mp4/.mov, file size does not exceed 100MB, video length does not exceed 10s and is not shorter than 2s, only 720p and 1080p are supported, length and width dimensions should both be between 720px and 1920px.
text	string	Yes	-	-	Text Content for Lip-Sync Video Generation. Max 120 characters.
voice_id	string	Yes	genshin_klee2	genshin_vindi2, zhinen_xuesheng, AOT, ai_shatang, genshin_klee2, genshin_kirara, ai_kaiya, oversea_male1, ai_chenjiahao_712, girlfriend_4_speech02, chat1_female_new-3, chat_0407_5-1, cartoon-boy-07, uk_boy1, cartoon-girl-01, PeppaPig_platform, ai_huangzhong_712, ai_huangyaoshi_712, ai_laoguowang_712, chengshu_jiejie, you_pingjing, calm_story1, uk_man2, laopopo_speech02, heainainai_speech02	Voice ID to use for speech synthesis
voice_language	string	No	en	zh, en	The voice language corresponding to the Voice ID
voice_speed	number	No	1	0.8 ~ 2.0	Speech rate for Text to Video generation

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Kwaivgi Kling Lipsync Audio To Video Kwaivgi Kling Text To Audio