Vidu Text-to-Video 2.0 converts text prompts into high-quality 720p videos with exceptional visual detail and diverse motion dynamics. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
ว่าง
$0.3ต่อครั้ง·~33 / $10
An elderly librarian with glowing spectacles carefully mends an ancient book made of pure light and floating runes. He is in a vast library carved from giant, luminous crystals, with shimmering dust motes floating in the air. The camera pans slowly, revealing countless other glowing books on the shelves.
“At sunset inside a quiet bamboo forest, a swordsman dressed in flowing white robes stands motionless, his long blade catching a faint glimmer of light. The breeze rustles through bamboo leaves, creating a soft rhythm. The camera glides from above the forest canopy, weaving through the bamboo until it focuses on the determined gaze of the warrior. Birds soar across the fading sky as golden light spills through the trees. The visual mood is poetic and cinematic, evoking a traditional ink-painting style, with warm tones and elegant details.”
“Inside a colossal starship drifting near a glowing nebula, a captain in a sleek uniform stands before towering glass windows, staring at fleets of spacecraft preparing for battle. The camera begins with a wide panoramic shot of the nebula’s swirling colors, then pushes inward through the ship’s command deck, highlighting blinking holographic displays. The style is epic, cinematic, evoking the grandeur of a space opera, with vivid cosmic lighting and ultra-high-definition detail.”
On a vast battlefield at dawn, rows of armored knights charge forward on horseback, their banners whipping in the wind. The camera sweeps from the blood-red sky over the clashing armies, then dives to focus on one warrior gripping a massive sword, his armor glinting with sparks of firelight. Arrows rain down, shields splinter, and smoke fills the horizon. The mood is gritty yet majestic, styled like a high-budget medieval fantasy film, with warm golden highlights and cinematic depth.
“Amid the ruins of a collapsed city, a lone survivor wearing a torn coat and gas mask scavenges through abandoned streets. Weeds and broken cars litter the ground, and faint fires smolder in the distance. The camera tracks from behind, then circles around to reveal the survivor’s weary eyes through the cracked mask. The atmosphere is bleak, desaturated, with heavy shadows, ash drifting through the air, and a tense cinematic tone reminiscent of dystopian survival dramas.”
“An endless desert of mirrors stretches into the horizon beneath a sky filled with floating whales that glow with bioluminescent patterns. A young girl in a red dress walks slowly across the reflective ground, her footsteps echoing like ripples. The camera pans across the surreal landscape, shifting between wide surreal vistas and close-ups of her awestruck face. The aesthetic is painterly, surreal, inspired by dreamscapes and fine art, with vibrant colors and a haunting, ethereal mood.”
“Under flickering street lamps in a rain-soaked 1940s alley, a detective in a trench coat lights a cigarette, the smoke curling upward into the shadows. A femme fatale waits at the end of the alley, her red dress glowing faintly under neon signage. The camera begins with a slow dolly through the rain-slicked cobblestones, pausing on reflections in puddles before cutting to a sharp close-up of the detective’s grim expression. The style is classic film noir: moody, high-contrast black and white with dramatic shadows, cinematic suspense, and a tense atmosphere.”
“Neon streets blaze as a motorbike speeds through a crowded cyber-metropolis, weaving between hover-cars and dazzling holograms. A drone pursues at high velocity, shooting beams of light that scorch the pavement. The camera races alongside the rider, darting between wide sweeping views of skyscrapers and intense first-person close-ups of the chase. The style is fast, kinetic, with glowing neon trails and cinematic slow-motion bursts, reminiscent of a futuristic action blockbuster.”
On a stormy mountaintop, a colossal stone titan awakens, its glowing eyes piercing through clouds as lightning strikes around it. A lone warrior with a flaming spear stands defiantly at the cliff’s edge. The camera swoops from the swirling storm down to the titan’s massive hand gripping the mountainside, then cuts to the warrior’s silhouette against the thunder. The atmosphere is epic and mythic, with dramatic lighting, golden fire against steel-gray skies, styled like a high-budget fantasy legend.”
“In a candlelit ballroom of the 18th century, couples in ornate gowns and tailored coats glide across the polished floor to the sound of a live string quartet. The camera begins with an overhead shot of glittering chandeliers, then flows downward, weaving through the dancers until it focuses on two lovers exchanging a secret glance. The atmosphere is elegant and intimate, with warm golden tones, rich detail in costumes, and a dreamy, cinematic romance.”
Cinematic close-up of a lone detective at a steaming noodle stall, rain slicking his trench coat. The background is filled with flickering neon signs and towering futuristic buildings. A holographic geisha advertisement glitches in the rain. The detective picks up his chopsticks, his eyes darting cautiously towards a street corner.
In a misty alien jungle, a giant biomechanical dragon unfolds its wings, made of metal and translucent energy membranes, for the first time. The circuit patterns on its wings pulse with blue light, startling the surrounding bioluminescent flora. Droplets of water slide off its metallic scales. Wide-angle, low-shot, emphasizing its immense scale.
A group of adorable capybaras are relaxing in a Japanese onsen, steam rising from the hot water. Several yuzu fruits float on the surface. One of the capybaras has a small towel folded on its head and lets out a lazy yawn. Studio Ghibli animation style, warm and healing.
Vidu Text-to-Video 2.0 turns your imagination into motion — a next-generation text-to-video model that produces cinematic 720p videos with smooth, expressive, and visually coherent motion. Now with flexible duration control, you can generate 5s or 8s clips for storytelling, concept visualization, or creative motion studies.
prompt — describe your scene (e.g., “A woman walking through a rainy street under neon lights”).
movement_amplitude — controls the motion strength of objects in the frame:
auto (default): automatically adjusts motion scale.
small: subtle, gentle movements (good for portraits or static shots).
medium: balanced motion, ideal for everyday scenes.
large: dramatic, cinematic camera or subject motion.
duration — select video length:
5: short clips, ideal for previews or teasers.
8: extended motion for storytelling or scene development.
seed — use a fixed number for reproducibility or leave empty for random generation.
| Resolution | Duration | Cost per Clip |
|---|---|---|
| 720p | 5s | $0.60 |
| 720p | 8s | $0.60 |
auto, small, medium, or large).5 or 8 seconds).Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/vidu/text-to-video-2.0 with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Text To Video 2.0 below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/vidu/text-to-video-2.0" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"duration": 5,
"resolution": "720p",
"movement_amplitude": "auto",
"seed": 0
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("vidu/text-to-video-2.0", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"duration": 5,
"resolution": "720p",
"movement_amplitude": "auto",
"seed": 0
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"vidu/text-to-video-2.0",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"duration": 5,
"resolution": "720p",
"movement_amplitude": "auto",
"seed": 0
}
)
print(output["outputs"][0]) # → URL of the generated outputText To Video 2.0 is a Vidu model for video generation, exposed as a REST API on WaveSpeedAI. Vidu Text-to-Video 2.0 converts text prompts into high-quality 720p videos with exceptional visual detail and diverse motion dynamics. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/vidu/vidu-text-to-video-2.0.
Text To Video 2.0 starts at $0.30 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `resolution`, `duration`, `seed`, `movement_amplitude`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/vidu/vidu-text-to-video-2.0.
Average end-to-end generation time on WaveSpeedAI is around 327 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Vidu). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.