Seedance 2.0 20% OFF | Video Generator で作成 →

Chatterbox Speech to Speech API

chatterbox /

Chatterbox Speech to Speech is a fast AI voice conversion model that converts source audio into a target voice style with optional reference audio guidance. Ready-to-use REST inference API for voice conversion, speech style transfer, dubbing, character voices, creator content, audio localization, and professional speech-to-speech workflows with simple integration, no coldstarts, and affordable pricing.

audio-to-audio
入力

ドラッグ&ドロップまたはクリックでアップロード

ドラッグ&ドロップまたはクリックでアップロード

待機中

$0.021回あたり·~50 / $1

サンプルすべて表示

関連モデル

README

Chatterbox Speech-to-Speech

Chatterbox Speech-to-Speech transforms a source audio clip into a target voice style using optional reference audio. It is suitable for voice conversion, style transfer, creator dubbing, character voice prototyping, and other speech-to-speech workflows where you want to preserve spoken content while changing vocal identity or delivery style.

Why Choose This?

  • Speech-to-speech conversion
    Transform an existing speech recording into a different voice style.

  • Optional reference voice guidance
    Add reference_audio when you want the output to follow a particular vocal tone or character.

  • Simple workflow
    Upload source audio, optionally upload a reference voice sample, and generate the converted result.

  • Useful for creator and dubbing workflows
    Suitable for voice restyling, character voice tests, demo production, and spoken-content transformation.

  • Production-ready API
    Useful for narration replacement, voice experiments, content localization, and creative audio workflows.

Parameters

ParameterRequiredDescription
audioYesSource audio to convert.
reference_audioNoOptional reference audio used to guide the target voice style.

How to Use

  1. Upload your source audio — provide the speech recording you want to transform.
  2. Upload reference audio (optional) — add a target voice sample if you want stronger style guidance.
  3. Submit — run the model and download the converted speech audio.

Example Use Case

Convert a spoken voice clip into a different vocal style for creator content, dubbing, or character voice testing.

Pricing

Just $0.02 per started minute.

Billing Rules

  • Pricing is $0.02 per started minute
  • Audio duration is billed in started 60-second units
  • Audio shorter than 60 seconds is billed as 1 minute
  • reference_audio does not affect pricing

Example Costs

Audio DurationCost
1s–60s$0.02
61s–120s$0.04
121s–180s$0.06

Best Use Cases

  • Voice style transfer — Convert speech into a different vocal tone or identity.
  • Character voice prototyping — Test alternative voice styles for characters or avatars.
  • Creator dubbing — Rework spoken audio for short-form content or promos.
  • Narration restyling — Preserve content while changing delivery feel.
  • Speech workflow experiments — Compare different voice directions from the same recording.

Pro Tips

  • Use clean source audio for better intelligibility.
  • Add reference_audio only when you want stronger target voice guidance.
  • Use a clear reference sample with stable tone for more consistent conversion.
  • Short clips are useful for testing before processing longer audio.

Notes

  • audio is required.
  • reference_audio is optional.
  • Pricing is based on source audio duration and billed per started minute.
  • Better source audio and cleaner reference audio generally improve output quality.

Related Models

  • Chatterbox Text-to-Speech — Generate speech directly from text.
  • Voice cloning workflows — Useful when you need a reusable custom voice identity instead of per-request voice guidance.
  • Audio generation workflows — Useful when you need music or sound generation instead of speech conversion.
アクセシビリティ:本サイトは第三者が提供するAIモデルを使用しています。

Speech To Speech API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/chatterbox/speech-to-speech with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Speech To Speech below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/chatterbox/speech-to-speech" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "audio": "https://example.com/your-audio.mp3"
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("chatterbox/speech-to-speech", {
        "audio": "https://example.com/your-audio.mp3"
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "chatterbox/speech-to-speech",
    {
    "audio": "https://example.com/your-audio.mp3"
}
)

print(output["outputs"][0])  # → URL of the generated output

Speech To Speech API — Frequently asked questions

What is the Speech To Speech API?

Speech To Speech is a Chatterbox model for AI inference, exposed as a REST API on WaveSpeedAI. Chatterbox Speech to Speech is a fast AI voice conversion model that converts source audio into a target voice style with optional reference audio guidance. Ready-to-use REST inference API for voice conversion, speech style transfer, dubbing, character voices, creator content, audio localization, and professional speech-to-speech workflows with simple integration, no coldstarts, and affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Speech To Speech API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/chatterbox/chatterbox-speech-to-speech.

How much does Speech To Speech cost per run?

Speech To Speech starts at $0.020 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Speech To Speech accept?

Key inputs: `audio`, `reference_audio`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/chatterbox/chatterbox-speech-to-speech.

How do I get started with the Speech To Speech API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Speech To Speech outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (Chatterbox). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.