Openai Whisper
Playground
Try it on WavespeedAI!Whisper Large v3 speech-to-text: instant, accurate multilingual transcripts with automatic language detection and punctuation. Upload audio to get transcripts. Ready-to-use REST API, no coldstarts, affordable pricing.
Features
OpenAI Whisper — Speech-to-Text
WaveSpeedAI’s Whisper deployment provides high-accuracy, production-ready speech recognition powered by OpenAI’s large-v3 model. It converts audio (MP3, WAV, FLAC) into precise, punctuated text with automatic language detection and multilingual support.
⚡ Highlights
- Multilingual Recognition — supports over 50 languages including English, Chinese, French, Japanese, and more.
- Accurate Punctuation & Capitalization — automatically generates clean, well-formatted text.
- Noise Robustness — performs reliably in diverse environments and accent variations.
- GPU-Optimized — fast and efficient transcription for real-world production workloads.
💰 Pricing
-
Basic Service (No Timestamp)
- $0.001 per second
-
Advanced Service (With Timestamp)
- $0.002 per second
🚀 Quick Start
- Upload your audio file (
.mp3,.wav,.flac) or provide a valid HTTPS link. - Choose between Basic or Advanced transcription modes.
- (Optional) Specify language manually, or leave Auto for automatic detection.
- Submit the request and receive your transcript in JSON format.
Example output:
{
"outputs": {
"text": "Hello everyone, welcome to the show."
}
}📝 Notes
- The Advanced Service provides timestamped segments for subtitles and detailed analysis.
- For long audio (>10 minutes), split into segments for optimal accuracy and speed.
- The model automatically handles punctuation, casing, and accent variations.
- Supported formats: MP3, WAV, FLAC, M4A.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/openai-whisper" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"language": "Auto",
"task": "transcribe",
"enable_timestamps": false,
"prompt": "",
"enable_sync_mode": true
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| audio | string | Yes | - | - | Audio file to transcribe. Provide an HTTPS URL or upload a file (MP3, WAV, FLAC up to 60 minutes). |
| language | string | No | Auto | Afrikaans, Amharic, Arabic, Assamese, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Tibetan, Breton, Bosnian, Catalan, Czech, Welsh, Danish, German, Greek, English, Spanish, Estonian, Basque, Persian, Finnish, Faroese, French, Galician, Gujarati, Hausa, Hawaiian, Hebrew, Hindi, Croatian, Haitian Creole, Hungarian, Armenian, Indonesian, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, Khmer, Kannada, Korean, Latin, Luxembourgish, Lingala, Lao, Lithuanian, Latvian, Malagasy, Maori, Macedonian, Malayalam, Mongolian, Marathi, Malay, Maltese, Myanmar, Nepali, Dutch, Nynorsk, Norwegian, Occitan, Punjabi, Polish, Pashto, Portuguese, Romanian, Russian, Sanskrit, Sindhi, Sinhala, Slovak, Slovenian, Shona, Somali, Albanian, Serbian, Sundanese, Swedish, Swahili, Tamil, Telugu, Tajik, Thai, Turkmen, Tagalog, Turkish, Tatar, Ukrainian, Urdu, Uzbek, Vietnamese, Yiddish, Yoruba, Chinese, Cantonese, Auto | Language spoken in the audio. Set to 'auto' for automatic language detection (default). |
| task | string | No | transcribe | transcribe, translate | The task to perform. 'transcribe' to the source language or 'translate' to English. |
| enable_timestamps | boolean | No | false | - | Enable to generate word-level timestamps for the transcription. Note: This may increase processing time. |
| prompt | string | No | - | An optional text to provide as a prompt to guide the model's style or continue a previous audio segment. The prompt should be in the same language as the audio. | |
| enable_sync_mode | boolean | No | true | - | If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |