NEW YEAR SALE: Get 15% Extra Credits, up to $150.Top Up Now!
Home/Explore/Image Editing/alibaba/qwen-image/translate
image-to-image

image-to-image

Alibaba Qwen Image Translate

alibaba/qwen-image/translate

Alibaba Qwen Vision Translate offers OCR-based image understanding and multilingual in-image text translation for context-aware results. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

preview
Whether to skip image segmentation

Idle

Your request will cost $0.01 per run.

For $1 you can run this model approximately 100 times.

One more thing::

ExamplesView all

README

Alibaba Qwen Translate – Image Understanding and Translation

Alibaba Qwen Translate is a multimodal model on Alibaba Cloud’s DashScope that combines high-accuracy OCR with multilingual translation. On WaveSpeedAI, it turns screenshots, documents, menus, and posters into clean, translated text in just a few seconds.

Why it stands out

  • Accurate OCR Extracts printed and many handwritten texts from photos, scans, and UI screenshots.

  • Strong multilingual support Detects and translates across English, Chinese, Japanese, Korean, French, German, Spanish, Russian, Arabic, and more.

  • Terminology and sensitive-word control Lets you define custom terminologies for domain-specific vocabulary and filter sensitive words in the output.

  • Document and layout awareness Handles forms, receipts, signs, and scanned pages with automatic text-region detection.

  • Fast, practical performance Suitable for real-world scenarios like menu translation, travel signage, study materials, and quick data capture.

Limits and performance

  • Supported input formats: PNG, JPEG, WEBP
  • Output: extracted text plus translation into the selected target language
  • Typical processing time: around 3–6 seconds per image
  • Segmentation: automatic text-region detection (can be disabled via the skip_image_segment option)

Pricing

Task typeCost per image
OCR / Translation$0.01

Flat pricing: every processed image is billed at $0.01, regardless of language pair or content length.

How to use

  1. Upload the image that contains the text you want to extract and translate.
  2. Set source_lang (for example: auto, en, zh, ja, ko, fr, de, es, ru, ar).
  3. Choose target_lang for the translation output.
  4. (Optional) Provide a terminologies list to enforce consistent translations for key terms.
  5. (Optional) Add sensitives to mask or filter sensitive words.
  6. (Optional) Enable skip_image_segment if you want to bypass automatic text-region segmentation.
  7. Run the job and view or download the extracted and translated text from the WaveSpeedAI interface.

Pro tips for best quality

  • Upload clear, high-resolution images; avoid heavy compression or motion blur.
  • Use auto for source_lang when the input might contain mixed or unknown languages.
  • Define terminologies for verticals like finance, medicine, or e-commerce to keep key phrases consistent.
  • Use sensitives to redact or mask names, IDs, or other sensitive fields before downstream use.
  • Keep segmentation enabled when working with complex layouts (tables, multi-column documents, or posters).

Works well with other WaveSpeedAI models

  • Google Nano Banana Pro – generate or edit high-quality localized visuals after you extract and translate text: Nano Banana Pro

  • ByteDance Seedream v4 – create style-consistent localized posters, banners, and illustration sets based on the translated content: ByteDance Seedream v4

Notes

  • Ideal for document digitization, translation of signage and menus, multilingual education content, and accessibility tools.
  • For URL-based images, make sure the link is publicly accessible; a valid image will show a preview in the WaveSpeedAI UI before you run the task.