Home/Explore/Wan 2.2 Models/wavespeed-ai/wan-2.2/text-to-image-lora
text-to-image

text-to-image

WAN 2.2 Text-to-Image With LoRA Support | High-Detail Image Generation | WaveSpeedAI

wavespeed-ai/wan-2.2/text-to-image-lora

WAN 2.2 generates super-detailed images from text prompts and supports custom LoRAs for fine-grained style and subject control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

width
height
If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.
If enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API.

Idle

An ethereal female elf with long platinum-blonde hair and pointed ears, wearing an elegant dress made of leaves and vines. She stands barefoot in an ancient forest illuminated by magical light spots, one hand gently touching a glowing mushroom. Her expression is otherworldly and curious. Dreamy soft focus, Tyndall effect light beams filtering through the canopy, magical realism photography.

Your request will cost $0.025 per run.

For $1 you can run this model approximately 40 times.

One more thing::

ExamplesView all

An ethereal female elf with long platinum-blonde hair and pointed ears, wearing an elegant dress made of leaves and vines. She stands barefoot in an ancient forest illuminated by magical light spots, one hand gently touching a glowing mushroom. Her expression is otherworldly and curious. Dreamy soft focus, Tyndall effect light beams filtering through the canopy, magical realism photography.
photo of a young chinese woman in her 20s, with shoulder-length wavy black hair and light makeup, sitting by the window in a vintage coffee shop, wearing a cozy beige sweater, holding a latte with both hands, gazing outside with a serene and focused expression, afternoon sunlight streams through the blinds creating soft light and shadow on her face, close-up shot, shallow depth of field, warm and healing mood, hyperrealistic, shot on Sony A7IV, 50mm f/1.8 lens.
a powerful close-up portrait of an 80-year-old Tibetan grandmother, her face is a canvas of deep wrinkles, weathered tan skin, her eyes are deep-set and kind, with a gentle smile, wearing a traditional Tibetan hat, dramatic Rembrandt lighting from the side, highlighting the texture and contour of her face against a dark, plain background, emotionally rich, hyper-detailed skin texture.
a capable 45-year-old female scientist, with her hair in a bun, wearing safety goggles and a white lab coat, standing in a modern, high-tech laboratory, holding a test tube and observing the liquid inside with a sharp, focused gaze, sophisticated instruments and data screens in the background, bright and clean lighting, professional and technological mood.
Wide-angle shot of a determined middle-aged male mountaineer standing on a snowy summit at sunrise, with a magnificent sea of clouds in the background. He is wearing professional climbing gear, his face is weathered, and his beard has frost on it, but his eyes show immense joy and accomplishment. He holds a trekking pole, gazing into the distance. Warm, epic, and holy lighting.
Documentary black and white street photography, **in the style of Henri Cartier-Bresson**. An elderly newspaper vendor is leaning against his stand, **lighting a cigarette in a fleeting moment between crowds**. He wears an old newsboy cap, **his weathered face has deep wrinkles**, his eyes are tired but sharp. The background shows a blurred New York street with pedestrians.
B0x13ng Boxing Video,Two male professional boxers in a brightly lit ring. The boxer on the left, an African man, throws a **powerful right hook**, his glove **making solid impact** with the Caucasian boxer's cheek. The impacted boxer's **facial muscles are distorted from the force**, with **sweat and spit flying from his mouth in a visible spray**. The **harsh overhead stadium lights** create **dramatic, high-contrast shadows**, highlighting the glistening muscle definition and sweat beads. The crowd in the background is blurred with camera flashes.
f4nt4sy_sc3n3 fantasy landscape,Epic landscape photography, a masterpiece. View from the summit of a high peak in the Swiss Alps, overlooking a **sea of rolling clouds** that fills the valley below. The **first light of sunrise** is breaking from behind a distant peak, casting **golden rays that rim-light the edges of the clouds** and create **dramatic crepuscular rays (Tyndall effect)**. The air is crisp and clear, with the sky graduating from deep blue to fiery orange. In the foreground, snow-dusted rocks add a sense of scale and depth.
l3g0_5ty13 Lego animation style, A cute 3D artwork. In a softly lit child's bedroom, a felt-textured teddy bear, a plush bunny rabbit, and a small wool felt lamb are sitting around a miniature table, having a tea party. Tiny teacups and cookies are on the table. Afternoon sunlight streams through a sheer curtain, casting warm light spots on the floor. The scene is filled with a dreamy and heartwarming feeling.

README

Wan-2.2-LoRA (Text-to-Image)

Wan-2.2-LoRA builds upon the acclaimed Wan 2.2 text-to-image model by introducing full LoRA (Low-Rank Adaptation) compatibility — empowering creators to fine-tune visuals with personalized styles, characters, or aesthetics. It combines Wan’s signature cinematic rendering and world-class detail synthesis with the flexibility of custom-trained LoRAs.

Why it looks great

  • LoRA-ready architecture – Import .safetensors LoRA weights directly from Civitai, Hugging Face.
  • Cinematic lighting engine – Advanced diffusion backbone that simulates depth, tone, and atmosphere with film-grade realism.
  • Text rendering excellence – Handles both English and Chinese typography seamlessly within the image, not as overlays.
  • Cross-style adaptability – From photorealism to anime, oil painting, 3D CG, or minimalism — one prompt can shift universes.
  • Consistent composition – Retains character identity and spatial coherence across multi-prompt workflows.

Limits and Performance

  • Max resolution per job: up to 1536 × 1536 pixels
  • LoRA path: supports <owner>/<model-name> or direct .safetensors URLs
  • LoRA scale: adjustable from 0.1 – 1.5 (default = 1.0)
  • Output formats: JPEG / PNG / WEBP
  • Processing speed: ~6–9 seconds per image
  • Prompt input: multi-line, bilingual, descriptive prompts supported

Pricing

  • $0.025 per image Each image is billed individually.

How to Use

  1. Write a detailed prompt (in English or Chinese).
  2. Set size — width and height (up to 1024×1024).
  3. Add LoRA(s) – paste LoRA path or URL; adjust scale for blending strength.
  4. (Optional) Set a seed for reproducibility (-1 = random).
  5. Choose output format (JPEG / PNG / WEBP).
  6. Run → preview result → iterate with different LoRAs or scales.

Pro tips for best quality

  • Mix multiple LoRAs for hybrid aesthetics (e.g., cyberpunk + watercolor).
  • Use 0.6–0.9 scale for realistic subtle blending.
  • Lock seed to maintain consistent faces or characters across styles.
  • Start from simple prompts; layer complexity gradually for control.

Reference

Note

  • LoRAs from Civitai or Hugging Face are also supported if exported in .safetensors format.
  • For multi-LoRA blending, ensure each LoRA file is stylistically aligned for optimal results.