WaveSpeedAI × WAN: SpeedUp 2nd - In CharacterJoin
Home/Explore/Kling Models/kwaivgi/kling-v2.6/create-voice
audio-to-audio

audio-to-audio

Kling 2.6 Create Voice

kwaivgi/kling-v2.6/create-voice

Kling 2.6 Create Voice is a model can generate custom voice. Upload an audio file to create a custom voice that can be used with the voice control feature in V2.6 video generation. The audio should be clean, noise-free, with a single voice, and duration between 5-30 seconds. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

Hint: You can drag and drop a file or click to upload

Idle

{ "voice_id": "831279595407818828" }

Your request will cost $0.035 per run.

For $1 you can run this model approximately 28 times.

ExamplesView all

README

Kwaivgi Kling v2.6 Create Voice

Kling v2.6 Create Voice is a lightweight helper endpoint for creating a reusable voice profile from an audio sample. The output is typically a voice identifier you can plug into Kling v2.6 “voice control” workflows (for example, generating dialogue in a video using your custom voice).

Use this when you want consistent narration or character speech across multiple Kling v2.6 generations, without re-uploading the same reference audio every time.

Key capabilities

  • Create a reusable voice profile from audio Upload or link to a voice sample and get back a voice reference you can re-use across runs.

  • Designed for Kling v2.6 voice control workflows The resulting voice can be used to drive speech generation in Kling v2.6 video endpoints that support custom voice IDs.

  • Simple, single-input interface Minimal setup: provide a clean reference clip and you’re ready to create a voice.

  • Supports common audio upload patterns Typically works with either a public URL or an uploaded audio file, depending on your integration.

  • Better consistency across scenes Re-using the same created voice helps keep a stable vocal identity across multiple generations.

Parameters and how to use

  • voice_url: (required) A URL (or uploaded file reference) pointing to the audio sample used to create the voice.

Media (Audio)

Provide a single voice sample that’s easy to learn from:

  • Use a clean, single-speaker clip (no background music, no overlapping voices).
  • Aim for consistent volume and minimal reverb/echo.
  • If you want a specific style (e.g., calm narrator, energetic host), choose a sample that clearly matches that delivery.

After you finish configuring the parameters, click Run, preview the result, and iterate if needed.

Pricing

  • $0.035 per run

Notes

How to write prompts that use a Voice ID

When you use Kling v2.6 video endpoints that support voice-controlled generation, you can reference created voices directly inside the text prompt.

  • Prompt length limit: your positive prompt cannot exceed 2500 characters.
  • Voice tag syntax: use <<<voice_1>>> (or <<<voice_2>>>) to specify which voice should speak.
  • Voice order must match voice_list: <<<voice_1>>> refers to the first voice in the voice_list parameter; <<<voice_2>>> refers to the second voice.
  • Up to 2 tones per task: a video generation task can reference at most 2 tones.
  • Tone requires sound=on: when specifying a tone, the sound parameter must be on.
  • Keep grammar simple: simpler sentence structure improves reliability. Example: The man <<<voice_1>>> said, “Hello.”
  • Billing behavior: if voice_list is not empty and the prompt references a voice tag (e.g., <<<voice_1>>>), the task is billed using the “with voice generation” metric.
  • Capability varies by mode/version: voice support differs across Kling model versions and video modes; check the current Capability Map for the endpoint you’re using.

Safety and permission

  • Consent matters: only create voices from audio you own or have explicit permission to use.
  • If the created voice sounds “off,” the fastest fix is usually a cleaner reference clip (single speaker, less noise, fewer artifacts).
  • Keep voice creation and voice usage consistent: once you have a voice ID, re-use it rather than re-creating new voices for the same speaker.

Related Models