Vidu Reference-to-Image Q2 generates high-quality images from 1–7 reference images plus a text prompt, preserving style and composition while allowing controlled changes to subjects, backgrounds, and fine details. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Ожидание

$0.04за запуск·~25 / $1

Cinematic sci-fi scene in orbit above Earth. Use image1 as the reference for the overall composition: the curved ring structure, the angle of the Earth below, the lighting and perspective. Transform the ring into a colossal space ferris wheel and amusement park: along the outer edge of the ring add large transparent cabins, glowing roller-coaster tracks, small spinning rides and observation pods, all evenly spaced. The cabins are lit with warm neon colors — cyan, magenta, orange — creating a halo of lights around the dark side of the ring. Keep the realistic detail level and materials from image1, metal panels, vents and structures, but blend them with the new attractions so it feels like a single coherent design. Deep black space in the background with a few stars, Earth below with soft blue atmosphere, high resolution, realistic cinematic sci-fi style, no text.

Realistic street photography in Japan at sunset, 35mm film look. Use image1 as the reference for the alley: same buildings, shop signs, vending machines, bicycles, perspective and warm evening light on the wet pavement. Replace the single person in the center with a three-member Japanese band performing in the street. On the left side of the alley, place a keyboard player standing behind a portable electronic keyboard on a stand. In the center, place the guitarist who is also the lead singer, facing the camera slightly, holding an electric guitar and singing into a microphone stand. On the right side, near the vending machines, place the drummer sitting behind a compact drum kit. Keep their outfits casual and modern, like an indie band. Preserve the original color tones and soft lighting of image1, natural lens perspective, shallow contrast, subtle grain, realistic candid street photo style, no added text.

Surreal dreamcore landscape, soft focus, hazy atmosphere. Use image1 as the reference for the overall scene: the rolling green hills, the wide striped field, the clear blue sky with a single large pink cloud, and the blue–pink color palette. Remove the pink house in the center and replace it with a single astronaut standing front-facing in the exact middle of the field, small in scale, perfectly aligned with the central perspective lines. The spacesuit is simple and realistic, softly reflecting blue and pink light. Add several white human hands emerging from the grass in the foreground and midground, like plants growing from the ground. Each hand has a single realistic eye on the palm, calmly staring toward the viewer. Maintain the original minimal composition and calm mood of image1, but introduce a subtle collage feeling: slightly cut-out shapes, layered textures, edges that feel like paper collage blended into the scene. Realistic photo style with dreamcore vibes, blue and pink tones, soft blur, gentle vignetting, light film grain, uncanny yet quiet atmosphere, no text.

Epic cinematic battle under the Eiffel Tower at night, 1:1 wide frame. Use image1 as the reference for Godzilla: keep the same body shape, scales and overall silhouette, towering over the city. Use image2 as the reference for Vecna from Stranger Things: keep his twisted organic body, vine-like growths and eerie posture, standing on the ground near the Eiffel Tower, facing Godzilla. Use image3 as the reference for the Paris cityscape: clearly show the Eiffel Tower in the midground, with Paris streets and buildings around it, night sky above. Godzilla and Vecna are locked in a dramatic clash: Godzilla roaring and charging a bright energy breath, Vecna raising one arm to summon dark red energy and crackling lightning in the sky. Low-angle viewpoint from the street level, looking up at both giants, with broken cars and debris in the foreground, no visible civilians. Strong contrast between cold blue light from Godzilla and ominous red light from Vecna, reflections on the metal structure of the Eiffel Tower, smoke and dust in the air, subtle film grain, ultra detailed, high resolution, cinematic concept art style.

Bold pop art poster, 4K resolution, vertical format. Use image2 as the reference for Albert Einstein’s face and famous tongue-out expression, keeping his facial features clearly recognizable. Place Einstein as the central figure in the composition, stylized in pop art with thick black outlines, simplified shading and graphic shapes. Use image1 as the reference for the background: transform the starry sky into a vibrant pop art pattern with large graphic stars, cosmic shapes and halftone dots. Strong contrasting colors: cyan, magenta, yellow, electric blue and hot pink, screen-print style. Add abstract rays and comic-style bursts radiating from Einstein’s head to suggest genius and explosive ideas, no text. Clean poster design, flat color blocks, sharp edges, slight halftone texture, retro pop art, Andy Warhol meets cosmic sci-fi, highly detailed. 1:1 frame
vidu/reference-to-image-q2 is the reference-guided sibling of vidu’s text-to-image model. It takes one or more reference images (up to 7) plus a prompt, and generates new, high-resolution images that keep the subject and composition while adjusting style, lighting, or scene details.
Upload 1–7 images in images to steer identity, pose, outfit, or composition. The model blends information across them while following your text prompt.
aspect_ratio supports:
auto – let the model choose a ratio that best matches the references + promptresolution lets you pick:
Combine references with a rich prompt (“dramatic studio lighting, cinematic close-up, 85mm lens, shallow depth of field”) to re-style while keeping the same subject.
seed set to -1 gives random variation; using a fixed integer lets you rerun the same combination of prompt + references for consistent outputs.
auto and let the model decide.-1 for randomness or a fixed integer for reproducible results.Pricing depends on resolution and how many reference images you use. Base rate is $0.04 per 1k compute units, applied via the internal formula:
| Resolution | Price per image |
|---|---|
| 1080p | $0.04 |
| 2K | $0.06 |
| 4K | $0.07 |
| Resolution | Price per image |
|---|---|
| 1080p | $0.05 |
| 2K | $0.10 |
| 4K | $0.15 |
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/vidu/reference-to-image-q2 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Reference To Image Q2 below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/vidu/reference-to-image-q2" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "auto",
"resolution": "1080p",
"seed": -1
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("vidu/reference-to-image-q2", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "auto",
"resolution": "1080p",
"seed": -1
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"vidu/reference-to-image-q2",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"aspect_ratio": "auto",
"resolution": "1080p",
"seed": -1
}
)
print(output["outputs"][0]) # → URL of the generated outputReference To Image Q2 is a Vidu model for image editing, exposed as a REST API on WaveSpeedAI. Vidu Reference-to-Image Q2 generates high-quality images from 1–7 reference images plus a text prompt, preserving style and composition while allowing controlled changes to subjects, backgrounds, and fine details. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/vidu/vidu-reference-to-image-q2.
Reference To Image Q2 starts at $0.040 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `images`, `aspect_ratio`, `resolution`, `seed`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/vidu/vidu-reference-to-image-q2.
Average end-to-end generation time on WaveSpeedAI is around 376 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Vidu). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.