Kwaivgi Kling V2.6 Pro Text To Video

Playground

Kling 2.6 Pro delivers top-tier text-to-video generation with smooth motion, cinematic visuals, strong prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

Features

Kling 2.6 Audio — Text-to-Video

Kling 2.6 Audio Text-to-Video turns a text prompt directly into a fully scored clip: camera motion, character action, and soundtrack (voice, ambience, SFX) are generated in one pass, so the scene looks and sounds like it belongs together.

🌟 Model Highlights

Joint audio–video generation – Visuals and sound are created together, not bolted on after the fact.
Character-aware voices – Speech that matches who’s on screen, with timing aligned to the action you describe.
Scene-driven sound design – Ambient noise and effects that follow the camera and events in the shot.
Script-to-scene pipeline – Start from a natural-language prompt; Kling handles shots, motion, and soundscape.

🧩 Parameters

prompt* – Describe what happens in the scene: characters, camera moves, environment, and audio mood (e.g. “Close-up of a robot repairing a neon sign, soft synthwave music, quiet city ambience, no dialogue.”)
negative_prompt – Things to avoid in both visuals and audio (logo, watermark, heavy text, glitch, noise).
cfg_scale – Guidance strength (default 0.5):
- Lower → looser, more organic; model improvises more.
- Higher → closer to prompt wording; can look or sound more “forced”.
sound –
- On → generate video with audio (voice / ambience / SFX where appropriate).
- Off → silent video only (cheaper, same visuals).
duration – 5 s or 10 s clips.

🎯 Typical Use Cases

Social ads or launch teasers with built-in narration and sound design.
Short story beats, animatics, or previz where visual + audio timing must line up.
Product explainers with spoken description + on-screen action.
Cinematic posts and shorts where you want music, ambience, and motion from a single prompt.

💰 Pricing

Mode	Length	Price
No Audio	5 s	$0.35
No Audio	10 s	$0.70
With Audio	5 s	$0.70
With Audio	10 s	$1.40

🚀 How to Use

Write a prompt describing:
- what the camera sees (shots, motion, setting),
- what characters do,
- and, if sound is on, the voice tone, music style, and ambience/SFX you want.
(Optional) Add a negative_prompt for things you don’t want in either image or audio.
Tune cfg_scale (start from 0.5; increase only if it’s not following your prompt enough).
Toggle sound on/off depending on whether you need audio.
Run the model.

🔎 Tips

Write prompts like a mini shot list + audio brief: who, where, camera, mood, and sound.
For clearer narration, explicitly specify “single narrator”, voice gender/age, and language/accents.
Use negative_prompt for “watermark, text, logo, glitch, noisy audio” to keep outputs clean.
For platform export (Reels/Shorts/TikTok), pick 9:16; for YouTube/web, use 16:9; for feeds/ads, try 1:1.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.6-pro/text-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "cfg_scale": 0.5,
    "sound": true,
    "aspect_ratio": "1:1",
    "duration": 5
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	The positive prompt for the generation.
negative_prompt	string	No		-	The negative prompt for the generation.
cfg_scale	number	No	0.5	0.00 ~ 1.00	Flexibility in video generation; The higher the value, the lower the model’s degree of flexibility, and the stronger the relevance to the user’s prompt.
sound	boolean	No	true	-	Whether sound is generated simultaneously when generating a video
aspect_ratio	string	No	1:1	1:1, 9:16, 16:9	The aspect ratio of the generated media.
duration	integer	No	5	5, 10	The duration of the generated media in seconds.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Array of URLs to the generated content (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Kwaivgi Kling V2.6 Pro Motion Control Kwaivgi Kling V2.6 Std Image To Video