vidu/reference-to-image-q2 — High-res reference-guided image generation
vidu/reference-to-image-q2 is the reference-guided sibling of vidu’s text-to-image model. It takes one or more reference images (up to 7) plus a prompt, and generates new, high-resolution images that keep the subject and composition while adjusting style, lighting, or scene details.
What it’s good for
- Keeping product, character, or actor identity consistent across many shots
- Creating new scenes from a small set of reference stills or keyframes
- Generating campaign variations while locking in pose, outfit, or layout
- Up-res, clean re-renders of storyboard / concept frames with cinematic quality
Key features
• Up to 7 reference images
Upload 1–7 images in images to steer identity, pose, outfit, or composition. The model blends information across them while following your text prompt.
• Cinematic aspect ratios
aspect_ratio supports:
- 1:1, 4:3, 3:4, 2:3, 3:2 – square and classic photo ratios
- 16:9, 21:9 – widescreen and banner formats
- 9:16 – vertical / mobile content
auto – let the model choose a ratio that best matches the references + prompt
• High resolutions (1080p → 4K)
resolution lets you pick:
- 1080p – fast preview / web use
- 2K – more detail and better crop flexibility
- 4K – maximum sharpness for key visuals and print-adjacent work
• Prompt-driven control
Combine references with a rich prompt (“dramatic studio lighting, cinematic close-up, 85mm lens, shallow depth of field”) to re-style while keeping the same subject.
• Seed-based reproducibility
seed set to -1 gives random variation; using a fixed integer lets you rerun the same combination of prompt + references for consistent outputs.
How to use (Playground)
- prompt* – Describe what you want to change or keep: style, lighting, mood, background, camera angle, etc.
- images* – Click “Add Item” and upload 1–7 reference images (subject, pose, layout, or mood).
- aspect_ratio – Choose a ratio, or leave as
auto and let the model decide.
- resolution – Select 1080p, 2K, or 4K depending on detail vs. speed needs.
- seed – Use
-1 for randomness or a fixed integer for reproducible results.
- Run the job, inspect the result, then iterate on prompt / references as needed.
Pricing
Pricing depends on resolution and how many reference images you use.
Base rate is $0.04 per 1k compute units, applied via the internal formula:
Up to 3 reference images (1–3 refs)
| Resolution | Price per image |
|---|
| 1080p | $0.04 |
| 2K | $0.06 |
| 4K | $0.07 |
4–7 reference images
| Resolution | Price per image |
|---|
| 1080p | $0.05 |
| 2K | $0.10 |
| 4K | $0.15 |
Tips for best quality
- Use clean, well-lit reference images; avoid heavy motion blur or extreme compression.
- Keep references stylistically consistent when possible (similar lighting / medium).
- In the prompt, clearly state both what must stay the same (“same person and outfit”) and what should change (“different background, golden-hour lighting”).
- For hero shots, generate at 2K or 4K, then downscale slightly for extra sharpness.