OpenAI Whisper — Speech-to-Text
WaveSpeedAI’s Whisper deployment provides high-accuracy, production-ready speech recognition powered by OpenAI’s large-v3 model.
It converts audio (MP3, WAV, FLAC) into precise, punctuated text with automatic language detection and multilingual support.
⚡ Highlights
- Multilingual Recognition — supports over 50 languages including English, Chinese, French, Japanese, and more.
- Accurate Punctuation & Capitalization — automatically generates clean, well-formatted text.
- Noise Robustness — performs reliably in diverse environments and accent variations.
- GPU-Optimized — fast and efficient transcription for real-world production workloads.
💰 Pricing
-
Basic Service (No Timestamp)
-
Advanced Service (With Timestamp)
🚀 Quick Start
- Upload your audio file (
.mp3, .wav, .flac) or provide a valid HTTPS link.
- Choose between Basic or Advanced transcription modes.
- (Optional) Specify language manually, or leave Auto for automatic detection.
- Submit the request and receive your transcript in JSON format.
Example output:
{
"outputs": {
"text": "Hello everyone, welcome to the show."
}
}
📝 Notes
- The Advanced Service provides timestamped segments for subtitles and detailed analysis.
- For long audio (>10 minutes), split into segments for optimal accuracy and speed.
- The model automatically handles punctuation, casing, and accent variations.
- Supported formats: MP3, WAV, FLAC, M4A.