
speech-to-text
Idle
{ "text": "Distinguished guests and dear friends, good evening. Time flies, and here we are tonight, gathered together for this wonderful celebration. Thank you all for being here. Your presence truly makes this evening shine. Tonight, let's set aside our daily routines and embrace the joy and warmth around us. May the laughter, music, and memories we create become treasures we carry in our hearts. A magical and unforgettable night ahead." }
Your request will cost $0.001 per run.
For $1 you can run this model approximately 1000 times.
OpenAI Whisper — Video-to-Text is a production-ready speech recognition endpoint powered by Whisper large-v3. It transcribes or translates speech directly from video files by extracting audio and returning clean, readable text, with optional word-level timestamps for subtitle and alignment workflows.
Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pay-per-second pricing.
| Parameter | Required | Description |
|---|---|---|
| video | Yes | Input video (upload or public URL). |
| language | No | Language code or auto (default). |
| task | No | transcribe or translate. |
| enable_timestamps | No | Generate word-level timestamps (may increase processing time). |
| prompt | No | Short guidance text to steer transcription/translation style. |
| enable_sync_mode | No | API only: wait for result and return it directly in the response. |
Upload video (or paste a public video URL).
Set language:
Choose task:
(Optional) Enable enable_timestamps if you need subtitle timing/alignment.
(Optional) Add a prompt to guide formatting or terminology (names, jargon, punctuation).
Run and read the transcript output.
API note: enable_sync_mode is not shown as a normal UI option; it’s only available through the API.
| Mode | enable_timestamps | Price per second |
|---|---|---|
| Standard | false | $0.001 / s |
| Timestamped | true | $0.002 / s |
Examples
| Video length | Standard | Timestamped |
|---|---|---|
| 60s | $0.06 | $0.12 |
| 600s (10 min) | $0.60 | $1.20 |