Bytedance Seed Speech Tts 2.0
Playground
Try it on WavespeedAI!ByteDance Seed Speech TTS 2.0 is a fast AI text-to-speech model that converts text into natural speech with multilingual voices, delivery controls, and MP3 or Opus output. Ready-to-use REST inference API for voice generation, narration, dubbing, virtual assistants, product demos, creator content, and professional TTS workflows with simple integration, no coldstarts, and affordable pricing.
Features
ByteDance Seed Speech TTS 2.0
ByteDance Seed Speech TTS 2.0 converts text into speech with a wide selection of multilingual voice presets and controls for language, speed, pitch, volume, sample rate, and output format. It is suitable for narration, voiceovers, character voices, multilingual content, and production-ready speech synthesis workflows.
Why Choose This?
-
High-quality text-to-speech Generate natural-sounding speech from plain text.
-
Large voice preset library Choose from many built-in voices across English, Chinese, Japanese, Spanish, Indonesian, Portuguese, Korean, Italian, German, and French.
-
Multilingual support Use a language override or leave it empty for automatic language detection.
-
Fine-grained voice controls Adjust speed, volume, pitch, sample rate, and optional voice instructions.
-
Voice instruction support Add natural-language instructions for tone, emotion, pace, or volume without having that instruction spoken aloud.
-
Production-ready API Suitable for narration, audiobooks, short-form content, virtual assistants, localization, and voice-based creative workflows.
Parameters
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The text to synthesize into speech. |
| voice | No | Voice preset to use for speech synthesis. Default: stokie_en. |
| voice_instruction | No | Optional natural-language instruction for tone, emotion, pace, or volume. It is not spoken aloud. |
| output_format | No | Output audio format. Supported values: mp3, opus. Default: mp3. |
| sample_rate | No | Sample rate of the output audio in Hz. Supported values: 8000, 16000, 22050, 24000, 32000, 44100, 48000. Default: 24000. |
| speed | No | Speech speed. Range: 0.5–2. Default: 1. |
| volume | No | Speech volume. Range: 0.5–2. Default: 1. |
| pitch | No | Voice pitch shift in semitones. Range: -12–12. Default: 0. |
| language | No | Optional language override. Leave unset for automatic language detection. |
How to Use
- Enter your text — provide the text you want to synthesize.
- Choose a voice — select the preset that best fits your use case.
- Add a voice instruction (optional) — guide emotion, pacing, tone, or delivery style.
- Set audio controls (optional) — adjust speed, volume, pitch, and sample rate.
- Choose output format — select
mp3oropus. - Set language (optional) — leave it empty for auto-detection, or choose a specific language.
- Submit — run the model and download the generated speech audio.
Example Voice Instruction
Warm, calm, confident narration with a slightly slower pace and soft expressive tone.
Pricing
Pricing is based on the length of the input text.
| Text Length | Cost |
|---|---|
| 1–1000 chars | $0.03 |
| 1001–2000 chars | $0.06 |
| 2001–3000 chars | $0.09 |
| 3001–4000 chars | $0.12 |
| 4001–5000 chars | $0.15 |
Billing Rules
- Pricing is $0.03 per started 1000 characters
- Character count is rounded up in blocks of 1000
- Minimum billed length is 1 block
voice,voice_instruction,output_format,sample_rate,speed,volume,pitch, andlanguagedo not affect pricing
Best Use Cases
- Narration — Generate voiceovers for videos, explainers, and presentations.
- Multilingual content — Produce speech in multiple languages with preset voices.
- Character voices — Create stylized spoken performances with different voice presets.
- Localized media — Adapt content for different languages and markets.
- Audio production — Build speech assets for apps, games, assistants, and creator workflows.
Pro Tips
- Use
voice_instructionwhen you want more expressive control without changing the text itself. - Keep
speednear1for natural speech, then adjust only if needed. - Use a fixed
languagewhen auto-detection may be ambiguous. - Try several voice presets before settling on one for a recurring project.
- Choose a higher sample rate when output quality matters more than file size.
Notes
textis required.voice_instructionaffects delivery style but is not spoken aloud.languagecan be left empty for automatic language detection.- Pricing depends only on input text length.
- Character count is billed in started 1000-character blocks.
Related Models
- ByteDance Seed speech workflows — Useful when you need other speech generation or voice-related capabilities.
- Voice cloning workflows — Useful when you need a reusable custom voice identity instead of preset voices.
- Audio generation workflows — Useful when you need music or sound generation instead of speech synthesis.
<ApiPage model={model}>
## Authentication
For authentication details, please refer to the [Authentication Guide](/docs-authentication).
## API Endpoints
### Submit Task & Query Result
## Parameters
### Task Submission Parameters
#### Request Parameters
#### Response Parameters
<SubmitResponse />
#### Result Request Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| id | string | Yes | - | Task ID |
#### Result Response Parameters
| Parameter | Type | Description |
|-----------|------|-------------|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., "success") |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content. |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
| data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
</ApiPage>