ByteDance Seed Speech TTS 2.0 is a fast AI text-to-speech model that converts text into natural speech with multilingual voices, delivery controls, and MP3 or Opus output. Ready-to-use REST inference API for voice generation, narration, dubbing, virtual assistants, product demos, creator content, and professional TTS workflows with simple integration, no coldstarts, and affordable pricing.
就绪
$0.03每次运行·~33 / $1
ByteDance Seed Speech TTS 2.0 converts text into speech with a wide selection of multilingual voice presets and controls for language, speed, pitch, volume, sample rate, and output format. It is suitable for narration, voiceovers, character voices, multilingual content, and production-ready speech synthesis workflows.
High-quality text-to-speech Generate natural-sounding speech from plain text.
Large voice preset library Choose from many built-in voices across English, Chinese, Japanese, Spanish, Indonesian, Portuguese, Korean, Italian, German, and French.
Multilingual support Use a language override or leave it empty for automatic language detection.
Fine-grained voice controls Adjust speed, volume, pitch, sample rate, and optional voice instructions.
Voice instruction support Add natural-language instructions for tone, emotion, pace, or volume without having that instruction spoken aloud.
Production-ready API Suitable for narration, audiobooks, short-form content, virtual assistants, localization, and voice-based creative workflows.
| Parameter | Required | Description |
|---|---|---|
| text | Yes | The text to synthesize into speech. |
| voice | No | Voice preset to use for speech synthesis. Default: stokie_en. |
| voice_instruction | No | Optional natural-language instruction for tone, emotion, pace, or volume. It is not spoken aloud. |
| output_format | No | Output audio format. Supported values: mp3, opus. Default: mp3. |
| sample_rate | No | Sample rate of the output audio in Hz. Supported values: 8000, 16000, 22050, 24000, 32000, 44100, 48000. Default: 24000. |
| speed | No | Speech speed. Range: 0.5–2. Default: 1. |
| volume | No | Speech volume. Range: 0.5–2. Default: 1. |
| pitch | No | Voice pitch shift in semitones. Range: -12–12. Default: 0. |
| language | No | Optional language override. Leave unset for automatic language detection. |
mp3 or opus.Warm, calm, confident narration with a slightly slower pace and soft expressive tone.
Pricing is based on the length of the input text.
| Text Length | Cost |
|---|---|
| 1–1000 chars | $0.03 |
| 1001–2000 chars | $0.06 |
| 2001–3000 chars | $0.09 |
| 3001–4000 chars | $0.12 |
| 4001–5000 chars | $0.15 |
voice, voice_instruction, output_format, sample_rate, speed, volume, pitch, and language do not affect pricingvoice_instruction when you want more expressive control without changing the text itself.speed near 1 for natural speech, then adjust only if needed.language when auto-detection may be ambiguous.text is required.voice_instruction affects delivery style but is not spoken aloud.language can be left empty for automatic language detection.Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/bytedance/seed-speech-tts-2.0 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Seed Speech Tts 2.0 below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/bytedance/seed-speech-tts-2.0" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"voice": "stokie_en",
"output_format": "mp3",
"sample_rate": 24000,
"speed": 1,
"volume": 1,
"pitch": 0,
"language": "zh"
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("bytedance/seed-speech-tts-2.0", {
"voice": "stokie_en",
"output_format": "mp3",
"sample_rate": 24000,
"speed": 1,
"volume": 1,
"pitch": 0,
"language": "zh"
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"bytedance/seed-speech-tts-2.0",
{
"voice": "stokie_en",
"output_format": "mp3",
"sample_rate": 24000,
"speed": 1,
"volume": 1,
"pitch": 0,
"language": "zh"
}
)
print(output["outputs"][0]) # → URL of the generated outputSeed Speech Tts 2.0 is a ByteDance model for audio generation, exposed as a REST API on WaveSpeedAI. ByteDance Seed Speech TTS 2.0 is a fast AI text-to-speech model that converts text into natural speech with multilingual voices, delivery controls, and MP3 or Opus output. Ready-to-use REST inference API for voice generation, narration, dubbing, virtual assistants, product demos, creator content, and professional TTS workflows with simple integration, no coldstarts, and affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seed-speech-tts-2.0.
Seed Speech Tts 2.0 starts at $0.030 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `language`, `output_format`, `pitch`, `sample_rate`, `speed`, `text`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seed-speech-tts-2.0.
Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.
Commercial usage rights depend on the model's license, set by its provider (ByteDance). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.