Uno

Playground

Uno AI transforms input images into new visuals guided by text prompts, blending reference images with your creative directions for precise, style-aware edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

UNO – Universal In-Context Diffusion Transformer

UNO is a subject-driven image generation model from ByteDance Research. It takes a small set of reference images plus a text prompt and synthesizes new scenes where the same subjects re-appear with high identity consistency and strong style control. It works for both single-subject and multi-subject prompts.

What UNO is good at

Subject-consistent generation Keep the same person, character, or product recognizable across new scenes and poses.
Single → multi-subject scenes Start from one subject or combine several references into a coherent group image.
Layout & style control Use the prompt and image_size to steer framing, setting, and visual mood while preserving identity.
Flexible aspect ratios Supports portrait, landscape, and square formats suitable for thumbnails, posts, key art, and ads.

Input Parameters

images (required)

1–5 reference images of your subject(s). These define identity, clothing, and overall look.

Use multiple angles or expressions for better robustness.
You can mix people, products, or characters, as long as the prompt makes their roles clear.

prompt (required)

Text description of the scene you want to generate, for example:

“Santa Claus is standing in front of the Christmas tree.”
“Two cartoon astronauts posing on the moon, product bottle in the center.”

UNO will combine the prompt with your references to place the subjects into the requested scene.

image_size

Controls aspect ratio and framing:

square_hd – high-res square
square – standard square
portrait_4_3, portrait_16_9
landscape_4_3, landscape_16_9

Choose based on where the image will be used (feed post, story, banner, thumbnail, etc.).

seed

Randomness control:

Empty / unset → a random seed each time.
Any integer → reproducible output for the same settings.

num_images

Number of images to generate per run (e.g., 1–4). Higher values give more options at once.

num_inference_steps

Number of diffusion steps (e.g., around 20–30 by default):

Fewer steps → faster, slightly less detailed.
More steps → slower, more refined and stable.

guidance_scale

Classifier-free guidance strength:

Lower values → more creative, looser interpretation of the prompt.
Higher values → closer adherence to the prompt and reference identity.

output_format

File format of the generated images:

jpeg
png

Designed For

Character & IP creators – Keep mascots or VTuber avatars on-model across many scenes.
Product & e-commerce teams – Generate consistent hero shots and lifestyle scenes for the same item.
Brand & marketing – Multi-subject key art where specific people or products must stay recognizable.
Concept artists – Rapidly explore compositions using a small library of reference looks.

How to Use

Upload 1–5 images of your subject(s).
Choose an image_size that matches your target placement (square, portrait, or landscape).
Write a clear prompt describing the scene, style, and relationships between subjects.
Optionally set seed, num_images, num_inference_steps, guidance_scale, and output_format.
Run the model, review the generated images, and iterate by tweaking prompt or references to refine identity and style.

Pricing

Per image just need $0.05!
Total price is 0.05 * num_images.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/uno" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "image_size": "square_hd",
    "num_images": 1,
    "num_inference_steps": 28,
    "guidance_scale": 3.5,
    "output_format": "jpeg"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
images	array	Yes	[]	-	URL of images to use while generating the image.
image_size	string	No	square_hd	square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9	The aspect ratio of the generated media.
prompt	string	Yes		-	The positive prompt for the generation.
seed	integer	No	-	-1 ~ 2147483647	The random seed to use for the generation.
num_images	integer	No	1	1 ~ 4	The number of images to generate.
num_inference_steps	integer	No	28	1 ~ 50	The number of inference steps to perform.
guidance_scale	number	No	3.5	1 ~ 20	The guidance scale to use for the generation.
output_format	string	No	jpeg	jpeg, png	The format of the output image.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Ultimate Image Upscaler Video Eraser