Bytedance Seed Speech Tts 2.0

Playground

ByteDance Seed Speech TTS 2.0 is a fast AI text-to-speech model that converts text into natural speech with multilingual voices, delivery controls, and MP3 or Opus output. Ready-to-use REST inference API for voice generation, narration, dubbing, virtual assistants, product demos, creator content, and professional TTS workflows with simple integration, no coldstarts, and affordable pricing.

Features

ByteDance Seed Speech TTS 2.0

ByteDance Seed Speech TTS 2.0 converts text into speech with a wide selection of multilingual voice presets and controls for language, speed, pitch, volume, sample rate, and output format. It is suitable for narration, voiceovers, character voices, multilingual content, and production-ready speech synthesis workflows.

Why Choose This?

High-quality text-to-speech Generate natural-sounding speech from plain text.
Large voice preset library Choose from many built-in voices across English, Chinese, Japanese, Spanish, Indonesian, Portuguese, Korean, Italian, German, and French.
Multilingual support Use a language override or leave it empty for automatic language detection.
Fine-grained voice controls Adjust speed, volume, pitch, sample rate, and optional voice instructions.
Voice instruction support Add natural-language instructions for tone, emotion, pace, or volume without having that instruction spoken aloud.
Production-ready API Suitable for narration, audiobooks, short-form content, virtual assistants, localization, and voice-based creative workflows.

Parameters

Parameter	Required	Description
text	Yes	The text to synthesize into speech.
voice	No	Voice preset to use for speech synthesis. Default: `stokie_en`.
voice_instruction	No	Optional natural-language instruction for tone, emotion, pace, or volume. It is not spoken aloud.
output_format	No	Output audio format. Supported values: `mp3`, `opus`. Default: `mp3`.
sample_rate	No	Sample rate of the output audio in Hz. Supported values: `8000`, `16000`, `22050`, `24000`, `32000`, `44100`, `48000`. Default: `24000`.
speed	No	Speech speed. Range: `0.5–2`. Default: `1`.
volume	No	Speech volume. Range: `0.5–2`. Default: `1`.
pitch	No	Voice pitch shift in semitones. Range: `-12–12`. Default: `0`.
language	No	Optional language override. Leave unset for automatic language detection.

How to Use

Enter your text — provide the text you want to synthesize.
Choose a voice — select the preset that best fits your use case.
Add a voice instruction (optional) — guide emotion, pacing, tone, or delivery style.
Set audio controls (optional) — adjust speed, volume, pitch, and sample rate.
Choose output format — select mp3 or opus.
Set language (optional) — leave it empty for auto-detection, or choose a specific language.
Submit — run the model and download the generated speech audio.

Example Voice Instruction

Warm, calm, confident narration with a slightly slower pace and soft expressive tone.

Pricing

Pricing is based on the length of the input text.

Text Length	Cost
1–1000 chars	$0.03
1001–2000 chars	$0.06
2001–3000 chars	$0.09
3001–4000 chars	$0.12
4001–5000 chars	$0.15

Billing Rules

Pricing is $0.03 per started 1000 characters
Character count is rounded up in blocks of 1000
Minimum billed length is 1 block
voice, voice_instruction, output_format, sample_rate, speed, volume, pitch, and language do not affect pricing

Best Use Cases

Narration — Generate voiceovers for videos, explainers, and presentations.
Multilingual content — Produce speech in multiple languages with preset voices.
Character voices — Create stylized spoken performances with different voice presets.
Localized media — Adapt content for different languages and markets.
Audio production — Build speech assets for apps, games, assistants, and creator workflows.

Pro Tips

Use voice_instruction when you want more expressive control without changing the text itself.
Keep speed near 1 for natural speech, then adjust only if needed.
Use a fixed language when auto-detection may be ambiguous.
Try several voice presets before settling on one for a recurring project.
Choose a higher sample rate when output quality matters more than file size.

Notes

text is required.
voice_instruction affects delivery style but is not spoken aloud.
language can be left empty for automatic language detection.
Pricing depends only on input text length.
Character count is billed in started 1000-character blocks.

ByteDance Seed speech workflows — Useful when you need other speech generation or voice-related capabilities.
Voice cloning workflows — Useful when you need a reusable custom voice identity instead of preset voices.
Audio generation workflows — Useful when you need music or sound generation instead of speech synthesis.



<ApiPage model={model}>
  ## Authentication

  For authentication details, please refer to the [Authentication Guide](/docs-authentication).

  ## API Endpoints

  ### Submit Task & Query Result

  ## Parameters

  ### Task Submission Parameters

  #### Request Parameters

  #### Response Parameters

  <SubmitResponse />

  #### Result Request Parameters

  | Parameter | Type | Required | Default | Description |
  |-----------|------|----------|---------|-------------|
  | id | string | Yes | - | Task ID |

  #### Result Response Parameters

  | Parameter | Type | Description |
  |-----------|------|-------------|
  | code | integer | HTTP status code (e.g., 200 for success) |
  | message | string | Status message (e.g., "success") |
  | data | object | The prediction data object containing all details |
  | data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
  | data.model | string | Model ID used for the prediction |
  | data.outputs | string | Array of URLs to the generated content. |
  | data.urls | object | Object containing related API endpoints |
  | data.urls.get | string | URL to retrieve the prediction result |
  | data.status | string | Status of the task: `created`, `processing`, `completed`, or `failed` |
  | data.created_at | string | ISO timestamp of when the request was created (e.g., "2023-04-01T12:34:56.789Z") |
  | data.error | string | Error message (empty if no error occurred) |
  | data.timings | object | Object containing timing details |
  | data.timings.inference | integer | Inference time in milliseconds |

</ApiPage>

Bytedance Lipsync Audio To Video Bytedance Seedance 2.0 Fast Image To Video