Vidu Reference To Video 2.0 | Multi-Entity Consistency Video Generation

Vidu Reference-to-Video 2.0 — vidu/reference-to-video-2.0

Vidu Reference-to-Video 2.0 generates a short video from a text prompt while using multiple reference images to guide subject identity, style, and scene consistency. Upload one or more reference images, describe the action and camera intent in the prompt, and the model synthesizes a coherent clip that follows your references. Movement intensity can be adjusted with movement_amplitude, and seed can be fixed for repeatable results.

Key capabilities

Prompt-driven video generation guided by reference images
Supports multiple reference images to keep identity/style consistent
Movement amplitude control: auto / small / medium / large
Seed control for reproducible generations
Good for “merge two references into one scene” style storytelling

Use cases

Character + scene blending (e.g., a person from one reference enters a room from another)
Style-consistent short clips based on an artwork reference
Multi-reference continuity across a mini story sequence
Product storytelling using a reference setup and a subject reference
Quick concept videos for ads, trailers, and social

Pricing

Duration	Price per video
5s	$0.20

Inputs

images (required): one or more reference images (add multiple items)
prompt (required): action + scene + camera direction

Parameters

aspect_ratio: output aspect ratio (e.g., 16:9)
movement_amplitude: motion intensity (auto, small, medium, large)
seed: random seed (set a number for reproducible results)

Prompting guide (multi-reference)

When you provide multiple references, explicitly assign what each reference is used for:

Template: Use reference image 1 for the room and lighting. Use reference image 2 for the character’s appearance and clothing. The character steps out of the painting into the room, walks to the table, and places the coffee cup down. Smooth motion, consistent style, fixed camera, no flicker.

Example prompts

Use reference 1 as the room scene and table setup. Use reference 2 for the girl’s identity and painting style. The girl steps out of the painting into the room, walks to the table, and gently places the coffee cup down. Warm morning light, cinematic, smooth transition.
Combine both references into one coherent scene. The character crosses the room, interacts with the cup, subtle cloth movement, soft shadows, realistic contact with the table surface.

Vidu Reference-to-Video 2.0 turns references into videos that preserve characters, objects, and environments with Multi-Entity Consistency. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

ExamplesView all

README