Chatterbox Text To Speech
Playground
Try it on WavespeedAI!Chatterbox Text to Speech is a fast AI TTS model that converts text into expressive speech with optional reference audio, emotive tags, and delivery controls. Ready-to-use REST inference API for voice generation, narration, character dialogue, dubbing, virtual assistants, creator content, and professional text-to-speech workflows with simple integration, no coldstarts, and affordable pricing.
Features
Chatterbox Text-to-Speech
Chatterbox Text-to-Speech converts text into expressive speech with optional reference-audio guidance and adjustable generation controls. It is suitable for narration, character voice prototyping, creator voiceovers, and other prompt-free speech generation workflows where you want simple text input plus style tuning.
Why Choose This?
-
Text-to-speech generation
Turn written text into spoken audio with a simple workflow. -
Optional reference voice guidance
Addreference_audiowhen you want the generated voice to follow a particular tone or vocal style. -
Expressiveness control
Useexaggerationto make the delivery more restrained or more expressive. -
Generation tuning
Adjusttemperatureandcfgfor different levels of variation and prompt adherence. -
Production-ready API
Useful for voiceovers, demos, creator content, character voice testing, and narration workflows.
Parameters
| Parameter | Required | Description |
|---|---|---|
| text | Yes | Text to synthesize into speech. |
| reference_audio | No | Optional reference audio used to guide the generated voice style. |
| exaggeration | No | Controls how expressive or exaggerated the generated delivery sounds. |
| temperature | No | Controls randomness and variation in the generated speech. |
| cfg | No | Guidance setting used to influence generation behavior. |
How to Use
- Enter your text — provide the script you want the model to speak.
- Add reference audio (optional) — upload a voice sample if you want style guidance.
- Adjust exaggeration (optional) — increase it for more expressive delivery or keep it lower for a calmer tone.
- Adjust temperature and cfg (optional) — tune generation behavior if needed.
- Submit — run the model and download the generated speech audio.
Example Text
Welcome to wavespeed! It’s nice to meet you!
Pricing
Pricing is based on the length of the input text.
| Text Length | Cost |
|---|---|
| 1–1000 characters | $0.03 |
| 1001–2000 characters | $0.06 |
| 2001–3000 characters | $0.09 |
| 3001–4000 characters | $0.12 |
| 4001–5000 characters | $0.15 |
Billing Rules
- Pricing is $0.025 per started 1000 characters
- Character count is rounded up in blocks of 1000
- Minimum billed length is 1 block
reference_audio,exaggeration,temperature, andcfgdo not affect pricing
Best Use Cases
- Narration — Generate spoken audio for explainers, demos, and presentations.
- Creator voiceovers — Produce quick voice tracks for short-form content.
- Character voice prototyping — Explore delivery styles with optional reference voice guidance.
- Product and onboarding content — Create friendly spoken intros or guidance clips.
- Speech testing workflows — Compare different expressive settings from the same script.
Pro Tips
- Keep the input text clean and naturally punctuated for better rhythm.
- Add
reference_audioonly when you want stronger voice-style guidance. - Lower
exaggerationfor calm narration and raise it for more dramatic delivery. - Increase
temperaturewhen you want more variation, or keep it lower for steadier results. - Test short lines first before generating longer scripts.
Notes
textis required.- Pricing depends only on input text length.
- Character count is billed in started 1000-character blocks.
- Optional tuning controls affect delivery style, but not price.
Related Models
- Other text-to-speech workflows — Useful when you need different voice style, pricing, or synthesis behavior.
- Voice cloning workflows — Useful when you need a reusable voice identity instead of prompt-guided speech generation.
- Audio generation workflows — Useful when you need music or sound generation instead of spoken voice.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/chatterbox/text-to-speech" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"exaggeration": 0.25,
"temperature": 0.7,
"cfg": 0.5
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| text | string | Yes | - | - | The text to convert to speech. You can add emotive tags such as <laugh>, <chuckle>, <sigh>, <cough>, <sniffle>, <groan>, <yawn>, and <gasp>. |
| reference_audio | string | No | - | - | Optional reference audio to guide the generated voice style and tone. |
| exaggeration | number | No | 0.25 | 0.00 ~ 1.00 | Expressiveness strength for generated speech. |
| temperature | number | No | 0.7 | 0.05 ~ 2.00 | Generation temperature. Higher values create more variation. |
| cfg | number | No | 0.5 | 0.10 ~ 1.00 | Classifier-free guidance weight for controlling generation. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content. |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |