Alibaba Qwen Translate – Image Understanding and Translation
Alibaba Qwen Translate is a multimodal model on Alibaba Cloud’s DashScope that combines high-accuracy OCR with multilingual translation. On WaveSpeedAI, it turns screenshots, documents, menus, and posters into clean, translated text in just a few seconds.
Why it stands out
-
Accurate OCR
Extracts printed and many handwritten texts from photos, scans, and UI screenshots.
-
Strong multilingual support
Detects and translates across English, Chinese, Japanese, Korean, French, German, Spanish, Russian, Arabic, and more.
-
Terminology and sensitive-word control
Lets you define custom terminologies for domain-specific vocabulary and filter sensitive words in the output.
-
Document and layout awareness
Handles forms, receipts, signs, and scanned pages with automatic text-region detection.
-
Fast, practical performance
Suitable for real-world scenarios like menu translation, travel signage, study materials, and quick data capture.
Limits and performance
- Supported input formats: PNG, JPEG, WEBP
- Output: extracted text plus translation into the selected target language
- Typical processing time: around 3–6 seconds per image
- Segmentation: automatic text-region detection (can be disabled via the skip_image_segment option)
Pricing
| Task type | Cost per image |
|---|
| OCR / Translation | $0.01 |
Flat pricing: every processed image is billed at $0.01, regardless of language pair or content length.
How to use
- Upload the image that contains the text you want to extract and translate.
- Set source_lang (for example: auto, en, zh, ja, ko, fr, de, es, ru, ar).
- Choose target_lang for the translation output.
- (Optional) Provide a terminologies list to enforce consistent translations for key terms.
- (Optional) Add sensitives to mask or filter sensitive words.
- (Optional) Enable skip_image_segment if you want to bypass automatic text-region segmentation.
- Run the job and view or download the extracted and translated text from the WaveSpeedAI interface.
Pro tips for best quality
- Upload clear, high-resolution images; avoid heavy compression or motion blur.
- Use
auto for source_lang when the input might contain mixed or unknown languages.
- Define terminologies for verticals like finance, medicine, or e-commerce to keep key phrases consistent.
- Use sensitives to redact or mask names, IDs, or other sensitive fields before downstream use.
- Keep segmentation enabled when working with complex layouts (tables, multi-column documents, or posters).
Works well with other WaveSpeedAI models
-
Google Nano Banana Pro – generate or edit high-quality localized visuals after you extract and translate text:
Nano Banana Pro
-
ByteDance Seedream v4 – create style-consistent localized posters, banners, and illustration sets based on the translated content:
ByteDance Seedream v4
Notes
- Ideal for document digitization, translation of signage and menus, multilingual education content, and accessibility tools.
- For URL-based images, make sure the link is publicly accessible; a valid image will show a preview in the WaveSpeedAI UI before you run the task.