
inworld-1.5-mini/text-to-speech
Inworld 1.5 Mini delivers high-quality text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
inworld-1.5-max/text-to-speech
Inworld 1.5 Max delivers premium text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and high-fidelity natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
qwen3-tts/voice-design
Qwen3 TTS Voice Design: Generate speech with custom voice characteristics described in natural language. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
vibevoice
Microsoft VibeVoice text-to-speech model generates long-form speech from text with multi-speaker dialogue support. Choose from 9 voice presets across English, Chinese, and Hindi. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
qwen3-tts/voice-clone
Qwen3 TTS Voice Clone: Clone any voice from a reference audio and generate speech in that voice. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
qwen3-tts/text-to-speech
Qwen3 TTS: Multi-language, multi-voice text-to-speech synthesis with style control. Supports 11 languages and 9 voice characters. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
speech-02-hd
Minimax Speech 02 HD is Minimax's high-definition text-to-speech model delivering clear HD voices; pricing $0.05 per 1,000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
speech-02-turbo
Minimax Speech-02 Turbo is a high-definition text-to-speech model delivering natural voice output. Cost: $0.03 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
music-02
Minimax Music-02 is a compact, fast, cost-effective MoE music generator (230B params, 10B active) for high-quality music production. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
ace-step/audio-outpaint
ACE-Step Audio Outpaint generates seamless start or end extensions that match the original, ideal for intros, outros and longer tracks. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
ace-step/audio-inpaint
ACE-Step Audio Inpaint edits a specific audio segment to change lyrics or style while preserving the surrounding audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
ace-step/audio-to-audio
ACE-Step Audio-to-Audio turns existing tracks into remixes or vocal edits using remix and lyrics modes while preserving audio character. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
ace-step
ACE-Step generates up to 4-minute music with lyrics from text and high acoustic fidelity; supports voice cloning, lyric edits, and remixing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
mmaudio-v2
MMaudio v2 produces synchronized audio from video or text inputs, ideal for adding soundtracks to videos when paired with video models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
voice-clone
Minimax Voice Clone creates high-quality voice clones from short reference clips, closely matching tone, accent, and speaking style. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
vibevoice
wavespeed-ai/vibevoice is an advanced voice generation model for producing high-fidelity, natural, and expressive speech from text, with optional speaker/region-style control for more precise results and easy integration into real-world applications. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
music-01
Minimax Music-01 Synthesizes Accompaniment And Vocals Simultaneously To Produce Complete Songs Across Diverse Styles. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
voice-design
MiniMax Voice Design generates natural voices from textual descriptions - no cloning - lets you set tone, accent and personality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
kling-text-to-audio
Kling Text-to-Audio turns text prompts into custom sound effects for videos, games, and multimedia using KlingAI's audio model. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
flash-v2
ElevenLabs Flash V2 is a Text-to-Speech model that converts text into spoken audio using the ElevenLabs Flash V2 engine. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
flash-v2.5
ElevenLabs Flash v2.5 is a text-to-speech model on WaveSpeedAI, billed at $0.05 per 1000 characters for generated speech. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
eleven-v3
ElevenLabs eleven-v3 is a text-to-speech model available as a hosted endpoint; requests cost $0.1 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
multilingual-v1
ElevenLabs Multilingual V1 provides natural-sounding multilingual text-to-speech across many languages. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
multilingual-v2
ElevenLabs Multilingual V2 is a multilingual text-to-speech model; cost $0.1 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
turbo-v2
ElevenLabs Turbo V2 is a Text-To-Speech model available via WaveSpeedAI, billed at $0.05 per 1000 characters for API requests. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
turbo-v2.5
ElevenLabs Turbo V2.5 is a text-to-speech model available via WaveSpeedAI, billed at $0.05 per 1000 characters for TTS requests. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
speech-2.6-turbo
Minimax Speech 2.6 Turbo is a Text-to-Speech model offering ultra-human voice cloning, industry-leading text normalization, sub-250ms latency and 40+ language support. Pricing: $0.06 per 1000 characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
speech-2.6-hd
Minimax Speech 2.6 HD: Ultra-human, low-latency (< 250ms) TTS with voice cloning, text normalization and support for 40+ languages. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
speech-2.5-hd-preview
MiniMax Speech 2.5 HD Preview offers HD TTS with enhanced multilingual expressiveness, accurate voice cloning, and 40-language support. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
speech-2.5-turbo-preview
Minimax Speech 2.5 Turbo Preview: HD TTS with multilingual support, accurate voice replication across 40 languages. $0.04/1000 chars. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
ace-step/prompt-to-audio
ACE-Step Prompt-to-Audio creates music from simple prompts, auto-generating genre tags and lyrics for quick song creation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
qwen3-tts-flash
Alibaba Qwen3 TTS Flash: Low-latency Text-to-Speech for English and Chinese with multiple voices, ideal for real-time dialogue. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
music-v1.5
MiniMax Music v1.5 turns text prompts into high-quality, diverse music (Text-to-Audio) using advanced AI for versatile tracks. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
kling-v1-tts
Kling V1 TTS creates natural-sounding audio and supports KlingAI image, video, sound effect, virtual model, and custom AI workflows. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
创建团队
