WaveSpeedAI × WAN: SpeedUp 2nd - In CharacterJoin
/탐색/OpenAI Models/wavespeed-ai/openai-whisper
speech-to-text

speech-to-text

OpenAI Whisper Large V3

wavespeed-ai/openai-whisper

Whisper Large v3 speech-to-text: instant, accurate multilingual transcripts with automatic language detection and punctuation. Upload audio to get transcripts. Ready-to-use REST API, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

Enable to generate word-level timestamps for the transcription. Note: This may increase processing time.
If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.

Idle

{ "text": "Every voice has the power to change people and every word is more valuable than we think." }

이 요청에는 $0.001 실행당가 필요합니다.

$1으로 이 모델을 약 1000회 실행할 수 있습니다.

예시전체 보기

README

OpenAI Whisper — Speech-to-Text

WaveSpeedAI’s Whisper deployment provides high-accuracy, production-ready speech recognition powered by OpenAI’s large-v3 model. It converts audio (MP3, WAV, FLAC) into precise, punctuated text with automatic language detection and multilingual support.

⚡ Highlights

  • Multilingual Recognition — supports over 50 languages including English, Chinese, French, Japanese, and more.
  • Accurate Punctuation & Capitalization — automatically generates clean, well-formatted text.
  • Noise Robustness — performs reliably in diverse environments and accent variations.
  • GPU-Optimized — fast and efficient transcription for real-world production workloads.

💰 Pricing

  • Basic Service (No Timestamp)

    • $0.001 per second
  • Advanced Service (With Timestamp)

    • $0.002 per second

🚀 Quick Start

  1. Upload your audio file (.mp3, .wav, .flac) or provide a valid HTTPS link.
  2. Choose between Basic or Advanced transcription modes.
  3. (Optional) Specify language manually, or leave Auto for automatic detection.
  4. Submit the request and receive your transcript in JSON format.

Example output:

{
  "outputs": {
    "text": "Hello everyone, welcome to the show."
  }
}

📝 Notes

  • The Advanced Service provides timestamped segments for subtitles and detailed analysis.
  • For long audio (>10 minutes), split into segments for optimal accuracy and speed.
  • The model automatically handles punctuation, casing, and accent variations.
  • Supported formats: MP3, WAV, FLAC, M4A.