OpenAI Whisper (Large-v3)
OpenAI Whisper (Large-v3) is a production-ready speech recognition model that transcribes or translates audio into clean, readable text. With support for dozens of languages, optional word-level timestamps, and flexible task modes — it's ideal for subtitling, transcription, and multilingual workflows.
Why It Stands Out
- Transcribe or translate: Choose between same-language transcription or translation to English.
- Multilingual support: Transcribe audio in dozens of languages with automatic language detection.
- Word-level timestamps: Generate precise timing data for subtitle alignment and editing workflows.
- Prompt-guided output: Steer formatting, terminology, or punctuation with custom prompts.
- Prompt Enhancer: Built-in AI-powered prompt optimization for better transcription guidance.
- Flexible input: Supports direct audio upload or public URL.
Parameters
| Parameter | Required | Description |
|---|
| audio | Yes | Upload or link to an audio file (MP3 / WAV / M4A, etc.). |
| language | No | Language code for transcription; use "auto" for automatic detection. |
| task | No | Choose "transcribe" for same-language or "translate" for English output. |
| enable_timestamps | No | Generate word-level timestamps (may increase processing time). |
| prompt | No | Short guidance text to steer transcription style or terminology. |
| enable_sync_mode | No | Wait for result before returning response (API only). |
How to Use
- Upload your audio — drag and drop a file or paste a public URL.
- Select language — choose a specific language or use "auto" for detection.
- Choose task — select "transcribe" for same-language output or "translate" for English.
- Enable timestamps (optional) — turn on for word-level timing data.
- Add a prompt (optional) — guide formatting, punctuation, or specific terminology.
- Click Run and wait for transcription to complete.
- Copy or download the transcribed text.
Best Use Cases
- Subtitle Generation — Create accurate, timed transcripts for video subtitling.
- Meeting Transcription — Convert recorded meetings, interviews, and calls into searchable text.
- Translation Workflows — Translate foreign-language audio directly to English text.
- Content Repurposing — Turn podcasts, webinars, and lectures into written content.
- Accessibility — Generate transcripts to make audio content accessible.
Pricing
| Mode | Price |
|---|
| Standard (no timestamps) | $0.001 / s |
| With timestamps enabled | $0.002 / s |
Total cost = duration of audio (in seconds) × price per second
Examples
- 60s audio (standard) → 60 × $0.001 = $0.06
- 60s audio (with timestamps) → 60 × $0.002 = $0.12
- 10 min (600s) audio (standard) → 600 × $0.001 = $0.60
- 10 min (600s) audio (with timestamps) → 600 × $0.002 = $1.20
Pro Tips for Best Quality
- Use clear audio with minimal background noise for optimal accuracy.
- Specify the language manually if auto-detection is inconsistent.
- Enable timestamps only when needed for subtitles or alignment — it doubles the cost.
- Add a prompt to guide transcription — include names, jargon, or formatting preferences.
- Use "translate" task for non-English audio when you need English output.
Notes
- Ensure uploaded audio URLs are publicly accessible.
- Timestamps are best for subtitles and editing, but may take longer to process.
- Processing time varies based on audio duration and current queue load.