Streaming API

Stream output chunks in real-time as the model generates them. Ideal for text-to-speech, music generation, and LLM responses.

Note: Streaming requests do not create a prediction record. Capture any data you need from the live stream.

Endpoint

POST https://api.wavespeed.ai/api/v3/{model_id}/stream

Add /stream to any supported model endpoint. The request payload is identical to the standard submission.

Request


curl -X POST "https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo/stream" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "text": "Hello world! This is a test of the text-to-speech system.",
  "voice_id": "Energetic_Girl",
  "emotion": "happy",
  "speed": 1,
  "volume": 1
}'

Response

The response is an event stream (text/event-stream). Chunks are relayed directly from the upstream model without buffering.

  • Stream closes when the provider finishes
  • Terminal status payload (if any) is included in the stream
  • Format mirrors the provider’s documentation (e.g., MiniMax uses JSON payloads per event)

Supported Models

ProviderModels
MiniMax Speechspeech-02-hd, speech-02-turbo, speech-2.5-hd-preview, speech-2.5-turbo-preview, speech-2.6-hd, speech-2.6-turbo
MiniMax Musicmusic-02
WaveSpeed LLMany-llm, any-llm/vision

Requests to /stream for unsupported models will return an error.

Notes

  • Your client must support Server-Sent Events (SSE) or chunked responses
  • Authentication, rate limits, and billing are the same as non-streaming endpoints
  • Use stream=True in Python requests to handle chunked responses
© 2025 WaveSpeedAI. All rights reserved.