Qwen multimodal models developed by Alibaba Cloud offer advanced capabilities in image and video generation. These models excel at creating high-quality visual content from text descriptions with a strong understanding of both Chinese and English prompts.
LoRA-ready Image Editing & Generation
- qwen-image/edit-plus-lora
Advanced image editing model with LoRA support, enabling precise style transfer, character customization, and high-fidelity local edits driven by text prompts.
- qwen-image/edit-lora
Lightweight edit model for LoRA-based style and character control, ideal for quick retouching, outfit changes, and consistent persona updates.
- qwen-image/text-to-image-lora
LoRA-enabled text-to-image generation that supports custom styles and characters while keeping strong prompt adherence and clean composition.
- jib-mix-qwen-image/text-to-image-lora
Mixed-style LoRA T2I model tuned for vivid anime and illustration aesthetics, combining sharp linework with rich color and expressive characters.
- qwen-image-lora-trainer
Training endpoint for building your own Qwen Image LoRA adapters from reference images, enabling personalized styles and characters across all LoRA-capable Qwen models.
Base Image Editing
- qwen-image/edit-plus
Enhanced image editing model for high-quality global and local edits, improving lighting, realism, and detail while preserving subject identity.
- qwen-image/edit
General-purpose edit model for everyday photo and artwork adjustments—ideal for quick fixes, background tweaks, and light retouching.
Base Text-to-Image Generation
- qwen-image/text-to-image
Core T2I model that generates clean, realistic images from text prompts, suitable for product shots, portraits, and general creative use.
- jib-mix-qwen-image/text-to-image
Stylized T2I variant blending anime and illustration styles, producing vibrant, character-focused art with strong visual appeal.
Utilities & Audio
- qwen-image/translate
Image translation utility that reads charts, UI screenshots, and text-heavy graphics, then outputs translated content while preserving layout semantics.
- qwen3-tts-flash
Fast text-to-speech model for natural-sounding voice previews, optimized for low latency in assistants, demos, and real-time applications.