Kwaivgi Kling V2.6 Create Voice
Playground
Try it on WavespeedAI!Kling 2.6 Create Voice is a model can generate custom voice. Upload an audio file to create a custom voice that can be used with the voice control feature in V2.6 video generation. The audio should be clean, noise-free, with a single voice, and duration between 5-30 seconds. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.
Features
Kwaivgi Kling v2.6 Create Voice
Kling v2.6 Create Voice is a lightweight helper endpoint for creating a reusable voice profile from an audio sample. The output is typically a voice identifier you can plug into Kling v2.6 “voice control” workflows (for example, generating dialogue in a video using your custom voice).
Use this when you want consistent narration or character speech across multiple Kling v2.6 generations, without re-uploading the same reference audio every time.
Key capabilities
-
Create a reusable voice profile from audio Upload or link to a voice sample and get back a voice reference you can re-use across runs.
-
Designed for Kling v2.6 voice control workflows The resulting voice can be used to drive speech generation in Kling v2.6 video endpoints that support custom voice IDs.
-
Simple, single-input interface Minimal setup: provide a clean reference clip and you’re ready to create a voice.
-
Supports common audio upload patterns Typically works with either a public URL or an uploaded audio file, depending on your integration.
-
Better consistency across scenes Re-using the same created voice helps keep a stable vocal identity across multiple generations.
Parameters and how to use
- voice_url: (required) A URL (or uploaded file reference) pointing to the audio sample used to create the voice.
Media (Audio)
Provide a single voice sample that’s easy to learn from:
- Use a clean, single-speaker clip (no background music, no overlapping voices).
- Aim for consistent volume and minimal reverb/echo.
- If you want a specific style (e.g., calm narrator, energetic host), choose a sample that clearly matches that delivery.
Commonly supported formats include: mp3, wav, m4a, ogg, aac.
After you finish configuring the parameters, click Run, preview the result, and iterate if needed.
Pricing
$0.035 per run
Notes
- Consent matters: only create voices from audio you own or have explicit permission to use.
- If the created voice sounds “off,” the fastest fix is usually a cleaner reference clip (single speaker, less noise, fewer artifacts).
- Keep voice creation and voice usage consistent: once you have a voice ID, re-use it rather than re-creating new voices for the same speaker.
Related Models
- Kling v2.6 Pro (Text-to-Video) – Use created voice IDs to generate videos with dialogue, ambience, and SFX.
- Kling v2.6 Pro (Image-to-Video) – Animate a still image into a video, optionally with voice-controlled speech.
- Kling Text-to-Audio – Generate sound effects and audio from text prompts.
- Kling Video-to-Audio – Generate or extract matching audio for an input video.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/kwaivgi/kling-v2.6/create-voice" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| audio | string | Yes | - | - | The voice needs to be clean and free of noise, with only one type of human voice present, with a duration of no less than 5 seconds and no longer than 30 seconds. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |