Vidu Reference To Image Q2
Playground
Try it on WavespeedAI!Vidu Reference-to-Image Q2 generates high-quality images based on reference images with customizable prompts. Supports 1-7 reference images with flexible aspect ratios and resolution options. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
vidu/reference-to-image-q2 — High-res reference-guided image generation
vidu/reference-to-image-q2 is the reference-guided sibling of vidu’s text-to-image model. It takes one or more reference images (up to 7) plus a prompt, and generates new, high-resolution images that keep the subject and composition while adjusting style, lighting, or scene details.
What it’s good for
- Keeping product, character, or actor identity consistent across many shots
- Creating new scenes from a small set of reference stills or keyframes
- Generating campaign variations while locking in pose, outfit, or layout
- Up-res, clean re-renders of storyboard / concept frames with cinematic quality
Key features
• Up to 7 reference images
Upload 1–7 images in images to steer identity, pose, outfit, or composition. The model blends information across them while following your text prompt.
• Cinematic aspect ratios
aspect_ratio supports:
- 1:1, 4:3, 3:4, 2:3, 3:2 – square and classic photo ratios
- 16:9, 21:9 – widescreen and banner formats
- 9:16 – vertical / mobile content
auto– let the model choose a ratio that best matches the references + prompt
• High resolutions (1080p → 4K)
resolution lets you pick:
- 1080p – fast preview / web use
- 2K – more detail and better crop flexibility
- 4K – maximum sharpness for key visuals and print-adjacent work
• Prompt-driven control
Combine references with a rich prompt (“dramatic studio lighting, cinematic close-up, 85mm lens, shallow depth of field”) to re-style while keeping the same subject.
• Seed-based reproducibility
seed set to -1 gives random variation; using a fixed integer lets you rerun the same combination of prompt + references for consistent outputs.
How to use (Playground)
- prompt* – Describe what you want to change or keep: style, lighting, mood, background, camera angle, etc.
- images* – Click “Add Item” and upload 1–7 reference images (subject, pose, layout, or mood).
- aspect_ratio – Choose a ratio, or leave as
autoand let the model decide. - resolution – Select 1080p, 2K, or 4K depending on detail vs. speed needs.
- seed – Use
-1for randomness or a fixed integer for reproducible results. - Run the job, inspect the result, then iterate on prompt / references as needed.
Pricing
Pricing depends on resolution and how many reference images you use. Base rate is $0.04 per 1k compute units, applied via the internal formula:
Up to 3 reference images (1–3 refs)
| Resolution | Price per image |
|---|---|
| 1080p | $1.60 |
| 2K | $2.40 |
| 4K | $2.80 |
4–7 reference images
| Resolution | Price per image |
|---|---|
| 1080p | $2.00 |
| 2K | $4.00 |
| 4K | $6.00 |
Tips for best quality
- Use clean, well-lit reference images; avoid heavy motion blur or extreme compression.
- Keep references stylistically consistent when possible (similar lighting / medium).
- In the prompt, clearly state both what must stay the same (“same person and outfit”) and what should change (“different background, golden-hour lighting”).
- For hero shots, generate at 2K or 4K, then downscale slightly for extra sharpness.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/vidu/reference-to-image-q2" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"aspect_ratio": "auto",
"resolution": "1080p",
"seed": -1
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| prompt | string | Yes | - | The text prompt for generating the image. | |
| images | array | Yes | [] | 1 ~ 7 items | The reference image to guide the generation. |
| aspect_ratio | string | No | auto | auto, 1:1, 16:9, 9:16, 4:3, 3:4, 21:9, 2:3, 3:2 | The aspect ratio for the generated image. 'auto' Generated image aspect ratio is consistent with the first input images. |
| resolution | string | No | 1080p | 1080p, 2K, 4K | The output resolution quality: 1080p (1920x1080), 2K (2560x1440), or 4K (3840x2160). |
| seed | integer | No | -1 | -1 ~ 2147483647 | The random seed to use for the generation. -1 means a random seed will be used. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |