Seedance 2.0 20 % DE DESCUENTO | Crea en el Video Generator →

ByteDance Seed Speech TTS 2.0 API

bytedance /

ByteDance Seed Speech TTS 2.0 is a fast AI text-to-speech model that converts text into natural speech with multilingual voices, delivery controls, and MP3 or Opus output. Ready-to-use REST inference API for voice generation, narration, dubbing, virtual assistants, product demos, creator content, and professional TTS workflows with simple integration, no coldstarts, and affordable pricing.

text-to-audio
Entrada

Inactivo

$0.03por ejecución·~33 / $1

EjemplosVer todo

Modelos relacionados

README

ByteDance Seed Speech TTS 2.0

ByteDance Seed Speech TTS 2.0 converts text into speech with a wide selection of multilingual voice presets and controls for language, speed, pitch, volume, sample rate, and output format. It is suitable for narration, voiceovers, character voices, multilingual content, and production-ready speech synthesis workflows.

Why Choose This?

  • High-quality text-to-speech Generate natural-sounding speech from plain text.

  • Large voice preset library Choose from many built-in voices across English, Chinese, Japanese, Spanish, Indonesian, Portuguese, Korean, Italian, German, and French.

  • Multilingual support Use a language override or leave it empty for automatic language detection.

  • Fine-grained voice controls Adjust speed, volume, pitch, sample rate, and optional voice instructions.

  • Voice instruction support Add natural-language instructions for tone, emotion, pace, or volume without having that instruction spoken aloud.

  • Production-ready API Suitable for narration, audiobooks, short-form content, virtual assistants, localization, and voice-based creative workflows.

Parameters

ParameterRequiredDescription
textYesThe text to synthesize into speech.
voiceNoVoice preset to use for speech synthesis. Default: stokie_en.
voice_instructionNoOptional natural-language instruction for tone, emotion, pace, or volume. It is not spoken aloud.
output_formatNoOutput audio format. Supported values: mp3, opus. Default: mp3.
sample_rateNoSample rate of the output audio in Hz. Supported values: 8000, 16000, 22050, 24000, 32000, 44100, 48000. Default: 24000.
speedNoSpeech speed. Range: 0.5–2. Default: 1.
volumeNoSpeech volume. Range: 0.5–2. Default: 1.
pitchNoVoice pitch shift in semitones. Range: -12–12. Default: 0.
languageNoOptional language override. Leave unset for automatic language detection.

How to Use

  1. Enter your text — provide the text you want to synthesize.
  2. Choose a voice — select the preset that best fits your use case.
  3. Add a voice instruction (optional) — guide emotion, pacing, tone, or delivery style.
  4. Set audio controls (optional) — adjust speed, volume, pitch, and sample rate.
  5. Choose output format — select mp3 or opus.
  6. Set language (optional) — leave it empty for auto-detection, or choose a specific language.
  7. Submit — run the model and download the generated speech audio.

Example Voice Instruction

Warm, calm, confident narration with a slightly slower pace and soft expressive tone.

Pricing

Pricing is based on the length of the input text.

Text LengthCost
1–1000 chars$0.03
1001–2000 chars$0.06
2001–3000 chars$0.09
3001–4000 chars$0.12
4001–5000 chars$0.15

Billing Rules

  • Pricing is $0.03 per started 1000 characters
  • Character count is rounded up in blocks of 1000
  • Minimum billed length is 1 block
  • voice, voice_instruction, output_format, sample_rate, speed, volume, pitch, and language do not affect pricing

Best Use Cases

  • Narration — Generate voiceovers for videos, explainers, and presentations.
  • Multilingual content — Produce speech in multiple languages with preset voices.
  • Character voices — Create stylized spoken performances with different voice presets.
  • Localized media — Adapt content for different languages and markets.
  • Audio production — Build speech assets for apps, games, assistants, and creator workflows.

Pro Tips

  • Use voice_instruction when you want more expressive control without changing the text itself.
  • Keep speed near 1 for natural speech, then adjust only if needed.
  • Use a fixed language when auto-detection may be ambiguous.
  • Try several voice presets before settling on one for a recurring project.
  • Choose a higher sample rate when output quality matters more than file size.

Notes

  • text is required.
  • voice_instruction affects delivery style but is not spoken aloud.
  • language can be left empty for automatic language detection.
  • Pricing depends only on input text length.
  • Character count is billed in started 1000-character blocks.

Related Models

  • ByteDance Seed speech workflows — Useful when you need other speech generation or voice-related capabilities.
  • Voice cloning workflows — Useful when you need a reusable custom voice identity instead of preset voices.
  • Audio generation workflows — Useful when you need music or sound generation instead of speech synthesis.
Accesibilidad:Este sitio web utiliza modelos de IA proporcionados por terceros.

Seed Speech Tts 2.0 API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/bytedance/seed-speech-tts-2.0 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Seed Speech Tts 2.0 below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/bytedance/seed-speech-tts-2.0" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "voice": "stokie_en",
    "output_format": "mp3",
    "sample_rate": 24000,
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "language": "zh"
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("bytedance/seed-speech-tts-2.0", {
        "voice": "stokie_en",
        "output_format": "mp3",
        "sample_rate": 24000,
        "speed": 1,
        "volume": 1,
        "pitch": 0,
        "language": "zh"
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "bytedance/seed-speech-tts-2.0",
    {
    "voice": "stokie_en",
    "output_format": "mp3",
    "sample_rate": 24000,
    "speed": 1,
    "volume": 1,
    "pitch": 0,
    "language": "zh"
}
)

print(output["outputs"][0])  # → URL of the generated output

Seed Speech Tts 2.0 API — Frequently asked questions

What is the Seed Speech Tts 2.0 API?

Seed Speech Tts 2.0 is a ByteDance model for audio generation, exposed as a REST API on WaveSpeedAI. ByteDance Seed Speech TTS 2.0 is a fast AI text-to-speech model that converts text into natural speech with multilingual voices, delivery controls, and MP3 or Opus output. Ready-to-use REST inference API for voice generation, narration, dubbing, virtual assistants, product demos, creator content, and professional TTS workflows with simple integration, no coldstarts, and affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Seed Speech Tts 2.0 API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seed-speech-tts-2.0.

How much does Seed Speech Tts 2.0 cost per run?

Seed Speech Tts 2.0 starts at $0.030 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Seed Speech Tts 2.0 accept?

Key inputs: `language`, `output_format`, `pitch`, `sample_rate`, `speed`, `text`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/bytedance/bytedance-seed-speech-tts-2.0.

How do I get started with the Seed Speech Tts 2.0 API?

Sign up for a free WaveSpeedAI account to claim starter credits, copy your API key from /accesskey, then call the endpoint shown in the API tab of the playground. The playground also auto-generates a code sample in Python, JavaScript, or cURL for the parameters you've set.

Can I use Seed Speech Tts 2.0 outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (ByteDance). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.