Streaming Task Submission

WaveSpeedAI exposes streaming variants of selected inference endpoints so you can start consuming output chunks as soon as the upstream provider emits them. The request payload is identical to the non-streaming submission; only the URL path changes.

Endpoint Pattern

POST https://api.wavespeed.ai/api/v3/{model_id}/stream

Replace {model_id} with the model identifier you would normally send tasks to (for example minimax/speech-02-turbo).

Example Request

The example below streams audio chunks from minimax/speech-02-turbo.


curl --location --request POST 'https://api.wavespeed.ai/api/v3/minimax/speech-02-turbo/stream' \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
  "emotion": "happy",
  "enable_sync_mode": false,
  "english_normalization": false,
  "pitch": 0,
  "speed": 1,
  "text": "Hello world! This is a test of the text-to-speech system.",
  "voice_id": "Energetic_Girl",
  "volume": 1
}'

Use the exact same payload structure as the standard submission endpoint. The only change is the /stream suffix in the URL.

Response Behaviour

The HTTP response is an event stream (text/event-stream). Every line you receive is relayed directly from the upstream model without additional processing or buffering on the WaveSpeedAI side.
When the provider finishes, the stream closes. If the provider also returns a terminal status payload, you will receive it raw in the stream as well.
Streaming submissions currently do not persist a retrievable prediction record, so capture any data you need from the live stream.

Supported Models

Streaming submission is currently available for the following model IDs:

minimax/speech-02-hd
minimax/speech-02-turbo
minimax/speech-2.5-hd-preview
minimax/speech-2.5-turbo-preview
minimax/speech-2.6-hd
minimax/speech-2.6-turbo
minimax/music-02
wavespeed-ai/any-llm
wavespeed-ai/any-llm/vision

Requests to /stream for other models fall back to the default behaviour and return an error.

Notes

Enable streaming only when your client is ready to consume Server-Sent Events (SSE) or chunked responses.
Because WaveSpeedAI simply forwards the provider’s events, expect the formatting of the stream to mirror the third party’s documentation (for example, MiniMax uses JSON payloads per event).
Authentication, rate limits, and billing are the same as the non-streaming endpoint.

Predictions Usage