Home/Explore/Vidu Models/vidu/reference-to-image-q2
image-generation

image-generation

Vidu Reference To Image Q2 | Reference-Based Image Generation | WaveSpeedAI

vidu/reference-to-image-q2

Vidu Reference-to-Image Q2 generates high-quality images based on reference images with customizable prompts. Supports 1-7 reference images with flexible aspect ratios and resolution options. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

preview

Idle

Cinematic sci-fi scene in orbit above Earth.
Use image1 as the reference for the overall composition: the curved ring structure, the angle of the Earth below, the lighting and perspective.
Transform the ring into a colossal space ferris wheel and amusement park: along the outer edge of the ring add large transparent cabins, glowing roller-coaster tracks, small spinning rides and observation pods, all evenly spaced.
The cabins are lit with warm neon colors — cyan, magenta, orange — creating a halo of lights around the dark side of the ring.
Keep the realistic detail level and materials from image1, metal panels, vents and structures, but blend them with the new attractions so it feels like a single coherent design.
Deep black space in the background with a few stars, Earth below with soft blue atmosphere, high resolution, realistic cinematic sci-fi style, no text.

Your request will cost $0.04 per run.

For $1 you can run this model approximately 25 times.

One more thing::

ExamplesView all

Cinematic sci-fi scene in orbit above Earth.
Use image1 as the reference for the overall composition: the curved ring structure, the angle of the Earth below, the lighting and perspective.
Transform the ring into a colossal space ferris wheel and amusement park: along the outer edge of the ring add large transparent cabins, glowing roller-coaster tracks, small spinning rides and observation pods, all evenly spaced.
The cabins are lit with warm neon colors — cyan, magenta, orange — creating a halo of lights around the dark side of the ring.
Keep the realistic detail level and materials from image1, metal panels, vents and structures, but blend them with the new attractions so it feels like a single coherent design.
Deep black space in the background with a few stars, Earth below with soft blue atmosphere, high resolution, realistic cinematic sci-fi style, no text.
Realistic street photography in Japan at sunset, 35mm film look.
Use image1 as the reference for the alley: same buildings, shop signs, vending machines, bicycles, perspective and warm evening light on the wet pavement.
Replace the single person in the center with a three-member Japanese band performing in the street.
On the left side of the alley, place a keyboard player standing behind a portable electronic keyboard on a stand.
In the center, place the guitarist who is also the lead singer, facing the camera slightly, holding an electric guitar and singing into a microphone stand.
On the right side, near the vending machines, place the drummer sitting behind a compact drum kit.
Keep their outfits casual and modern, like an indie band.
Preserve the original color tones and soft lighting of image1, natural lens perspective, shallow contrast, subtle grain, realistic candid street photo style, no added text.
Surreal dreamcore landscape, soft focus, hazy atmosphere.
Use image1 as the reference for the overall scene: the rolling green hills, the wide striped field, the clear blue sky with a single large pink cloud, and the blue–pink color palette.
Remove the pink house in the center and replace it with a single astronaut standing front-facing in the exact middle of the field, small in scale, perfectly aligned with the central perspective lines. The spacesuit is simple and realistic, softly reflecting blue and pink light.
Add several white human hands emerging from the grass in the foreground and midground, like plants growing from the ground. Each hand has a single realistic eye on the palm, calmly staring toward the viewer.
Maintain the original minimal composition and calm mood of image1, but introduce a subtle collage feeling: slightly cut-out shapes, layered textures, edges that feel like paper collage blended into the scene.
Realistic photo style with dreamcore vibes, blue and pink tones, soft blur, gentle vignetting, light film grain, uncanny yet quiet atmosphere, no text.
Epic cinematic battle under the Eiffel Tower at night, 1:1 wide frame.
Use image1 as the reference for Godzilla: keep the same body shape, scales and overall silhouette, towering over the city.
Use image2 as the reference for Vecna from Stranger Things: keep his twisted organic body, vine-like growths and eerie posture, standing on the ground near the Eiffel Tower, facing Godzilla.
Use image3 as the reference for the Paris cityscape: clearly show the Eiffel Tower in the midground, with Paris streets and buildings around it, night sky above.
Godzilla and Vecna are locked in a dramatic clash: Godzilla roaring and charging a bright energy breath, Vecna raising one arm to summon dark red energy and crackling lightning in the sky.
Low-angle viewpoint from the street level, looking up at both giants, with broken cars and debris in the foreground, no visible civilians.
Strong contrast between cold blue light from Godzilla and ominous red light from Vecna, reflections on the metal structure of the Eiffel Tower, smoke and dust in the air, subtle film grain, ultra detailed, high resolution, cinematic concept art style.
Bold pop art poster, 4K resolution, vertical format.
Use image2 as the reference for Albert Einstein’s face and famous tongue-out expression, keeping his facial features clearly recognizable.
Place Einstein as the central figure in the composition, stylized in pop art with thick black outlines, simplified shading and graphic shapes.
Use image1 as the reference for the background: transform the starry sky into a vibrant pop art pattern with large graphic stars, cosmic shapes and halftone dots.
Strong contrasting colors: cyan, magenta, yellow, electric blue and hot pink, screen-print style.
Add abstract rays and comic-style bursts radiating from Einstein’s head to suggest genius and explosive ideas, no text.
Clean poster design, flat color blocks, sharp edges, slight halftone texture, retro pop art, Andy Warhol meets cosmic sci-fi, highly detailed.
1:1 frame

README

vidu/reference-to-image-q2 — High-res reference-guided image generation

vidu/reference-to-image-q2 is the reference-guided sibling of vidu’s text-to-image model. It takes one or more reference images (up to 7) plus a prompt, and generates new, high-resolution images that keep the subject and composition while adjusting style, lighting, or scene details.

What it’s good for

  • Keeping product, character, or actor identity consistent across many shots
  • Creating new scenes from a small set of reference stills or keyframes
  • Generating campaign variations while locking in pose, outfit, or layout
  • Up-res, clean re-renders of storyboard / concept frames with cinematic quality

Key features

• Up to 7 reference images

Upload 1–7 images in images to steer identity, pose, outfit, or composition. The model blends information across them while following your text prompt.

• Cinematic aspect ratios

aspect_ratio supports:

  • 1:1, 4:3, 3:4, 2:3, 3:2 – square and classic photo ratios
  • 16:9, 21:9 – widescreen and banner formats
  • 9:16 – vertical / mobile content
  • auto – let the model choose a ratio that best matches the references + prompt

• High resolutions (1080p → 4K)

resolution lets you pick:

  • 1080p – fast preview / web use
  • 2K – more detail and better crop flexibility
  • 4K – maximum sharpness for key visuals and print-adjacent work

• Prompt-driven control

Combine references with a rich prompt (“dramatic studio lighting, cinematic close-up, 85mm lens, shallow depth of field”) to re-style while keeping the same subject.

• Seed-based reproducibility

seed set to -1 gives random variation; using a fixed integer lets you rerun the same combination of prompt + references for consistent outputs.

How to use (Playground)

  1. prompt* – Describe what you want to change or keep: style, lighting, mood, background, camera angle, etc.
  2. images* – Click “Add Item” and upload 1–7 reference images (subject, pose, layout, or mood).
  3. aspect_ratio – Choose a ratio, or leave as auto and let the model decide.
  4. resolution – Select 1080p, 2K, or 4K depending on detail vs. speed needs.
  5. seed – Use -1 for randomness or a fixed integer for reproducible results.
  6. Run the job, inspect the result, then iterate on prompt / references as needed.

Pricing

Pricing depends on resolution and how many reference images you use. Base rate is $0.04 per 1k compute units, applied via the internal formula:

Up to 3 reference images (1–3 refs)

ResolutionPrice per image
1080p$0.04
2K$0.06
4K$0.07

4–7 reference images

ResolutionPrice per image
1080p$0.05
2K$0.10
4K$0.15

Tips for best quality

  • Use clean, well-lit reference images; avoid heavy motion blur or extreme compression.
  • Keep references stylistically consistent when possible (similar lighting / medium).
  • In the prompt, clearly state both what must stay the same (“same person and outfit”) and what should change (“different background, golden-hour lighting”).
  • For hero shots, generate at 2K or 4K, then downscale slightly for extra sharpness.