Home/Explore/wavespeed-ai/uno

image-to-image

wavespeed-ai/uno

An AI model that transforms input images into new ones based on text prompts, blending reference visuals with your creative directions.

Doc
preview
preview
preview
If set to true, the safety checker will be enabled.

Idle

A woman wears the dress and holds a bag, in the flowers.

Your request will cost $0.05 per run.

For $1 you can run this model approximately 20 times.

README

UNO – Universal In‑Context Diffusion Transformer 📸

A powerful subject-driven image synthesis model (developed by ByteDance Research) enabling both single-subject and multi-subject image generation with high consistency and controllability using diffusion transformers.

Implementation ✨

This model leverages a two-stage progressive cross‑modal alignment strategy, combined with Universal Rotary Position Embedding (UnoPE):

  1. Stage I: Fine-tune a pretrained T2I (text-to-image) model using generated single-subject in-context data.
  2. Stage II: Further train on multi-subject paired data to support scenes with multiple specified subjects. :contentReference[oaicite:1]{index=1}

Highlights:

  • Built on Diffusion Transformers (DiT) with FLUX.1-dev backbone
  • UnoPE maintains subject identity and reduces confusion across multiple subjects :contentReference[oaicite:2]{index=2}
  • Input: 1–4 reference images + text prompt
  • Output: synthesized image reflecting consistent subject(s) in context

Key Features

  • High-consistency, multi-subject generation—preserves unique subject traits across images :contentReference[oaicite:3]{index=3}
  • 🔁 Single → multi subject scaling via staged training
  • 🔧 Controllable layout and reference identity handling
  • 📐 Handles varying aspect ratios and resolutions (512–704px+) :contentReference[oaicite:4]{index=4}

Predictions Examples 🌟

  • Generating images of the same person in different settings
  • Placing multiple consistent products or characters in a single scene
  • Virtual try-on and identity-preserving e-commerce renders