Kwaivgi Kling v2.6 Create Voice
Kling v2.6 Create Voice is a lightweight helper endpoint for creating a reusable voice profile from an audio sample. The output is typically a voice identifier you can plug into Kling v2.6 “voice control” workflows (for example, generating dialogue in a video using your custom voice).
Use this when you want consistent narration or character speech across multiple Kling v2.6 generations, without re-uploading the same reference audio every time.
Key capabilities
-
Create a reusable voice profile from audio
Upload or link to a voice sample and get back a voice reference you can re-use across runs.
-
Designed for Kling v2.6 voice control workflows
The resulting voice can be used to drive speech generation in Kling v2.6 video endpoints that support custom voice IDs.
-
Simple, single-input interface
Minimal setup: provide a clean reference clip and you’re ready to create a voice.
-
Supports common audio upload patterns
Typically works with either a public URL or an uploaded audio file, depending on your integration.
-
Better consistency across scenes
Re-using the same created voice helps keep a stable vocal identity across multiple generations.
Parameters and how to use
- voice_url: (required) A URL (or uploaded file reference) pointing to the audio sample used to create the voice.
Media (Audio)
Provide a single voice sample that’s easy to learn from:
- Use a clean, single-speaker clip (no background music, no overlapping voices).
- Aim for consistent volume and minimal reverb/echo.
- If you want a specific style (e.g., calm narrator, energetic host), choose a sample that clearly matches that delivery.
After you finish configuring the parameters, click Run, preview the result, and iterate if needed.
Pricing
Notes
How to write prompts that use a Voice ID
When you use Kling v2.6 video endpoints that support voice-controlled generation, you can reference created voices directly inside the text prompt.
- Prompt length limit: your positive prompt cannot exceed 2500 characters.
- Voice tag syntax: use <<<voice_1>>> (or <<<voice_2>>>) to specify which voice should speak.
- Voice order must match voice_list: <<<voice_1>>> refers to the first voice in the voice_list parameter; <<<voice_2>>> refers to the second voice.
- Up to 2 tones per task: a video generation task can reference at most 2 tones.
- Tone requires sound=on: when specifying a tone, the sound parameter must be on.
- Keep grammar simple: simpler sentence structure improves reliability.
Example: The man <<<voice_1>>> said, “Hello.”
- Billing behavior: if voice_list is not empty and the prompt references a voice tag (e.g., <<<voice_1>>>), the task is billed using the “with voice generation” metric.
- Capability varies by mode/version: voice support differs across Kling model versions and video modes; check the current Capability Map for the endpoint you’re using.
Safety and permission
- Consent matters: only create voices from audio you own or have explicit permission to use.
- If the created voice sounds “off,” the fastest fix is usually a cleaner reference clip (single speaker, less noise, fewer artifacts).
- Keep voice creation and voice usage consistent: once you have a voice ID, re-use it rather than re-creating new voices for the same speaker.
Related Models
- Kling v2.6 Pro (Text-to-Video) – Use created voice IDs to generate videos with dialogue, ambience, and SFX.
- Kling v2.6 Pro (Image-to-Video) – Animate a still image into a video, optionally with voice-controlled speech.
- Kling Text-to-Audio – Generate sound effects and audio from text prompts.
- Kling Video-to-Audio – Generate or extract matching audio for an input video.