Enjoy 50% OFF Vidu Q3 & Q3 Pro models • Only on WaveSpeedAI | May 20 – Jun 2
Shengshu·video·From $0.25/run

Vidu Q3 API

Shengshu's Vidu Q3 — cinematic-quality video generation with text-to-video, image-to-video, reference-to-video (up to 7 subjects in one shot), and start-end frame interpolation. Three tiers: Standard, Pro, Turbo.

About the Vidu Q3 API

What Vidu Q3 does, how it fits in the Shengshu model lineup, and why teams reach for it.

Vidu Q3 is a video generation model from Shengshu, available through the WaveSpeedAI REST API. Shengshu's Vidu Q3 — cinematic-quality video generation with text-to-video, image-to-video, reference-to-video (up to 7 subjects in one shot), and start-end frame interpolation. Three tiers: Standard, Pro, Turbo.

The Vidu Q3 family on WaveSpeedAI ships 11 REST endpoints covering Image-To-Video, Text-To-Video workflows. Each variant carries its own pricing, parameter knobs, and example outputs — pick the one that matches your input modality and production constraints, or call several from the same API key to compose multi-step pipelines.

Run Vidu Q3 through the same API key, billing account, and rate-limit envelope you use for the other 1,000+ AI models on WaveSpeedAI. No separate vendor setup, no per-provider SDKs, no per-vendor rate-limit envelopes — one integration covers everything from text-to-image and text-to-video through audio synthesis, 3D generation, upscaling, and editing.

All Vidu Q3 API endpoints

11 endpoints available now on WaveSpeedAI — pick the variant that matches your workflow.

Image To Video Spicy — Vidu Q3 image-to-video preview from Shengshu

Image To Video Spicy

Vidu Q3 Image-to-Video Spicy generates unlimited high-quality videos from images with smooth animations and diverse motion, optimized for scalable content generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.35
Text To Video — Vidu Q3 text-to-video preview from Shengshu

Text To Video

Vidu Q3 Pro Text to Video is a fast AI video generation model that creates high-quality, audio-capable videos from text prompts with support for 1–16 second outputs. Ready-to-use REST inference API for cinematic clips, advertising creatives, social media videos, product visuals, storytelling, and professional text-to-video workflows with simple integration, no coldstarts, and affordable pricing.

text-to-videofrom $0.25
Start End To Video — Vidu Q3 image-to-video preview from Shengshu

Start End To Video

Vidu Q3 Pro Start-End-to-Video creates smooth transitions between two keyframes with viduq3-pro (1–16s). Billing follows Vidu's published Q3-pro per-second rates by resolution. Ready-to-use REST inference API on WaveSpeed.

image-to-videofrom $0.25
Start End To Video — Vidu Q3 image-to-video preview from Shengshu

Start End To Video

Vidu Q3 Turbo Start-End-to-Video creates smooth transitions between two images with faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.30
Start End To Video — Vidu Q3 image-to-video preview from Shengshu

Start End To Video

Vidu Q3 Start End Image-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.35
Reference To Video — Vidu Q3 image-to-video preview from Shengshu

Reference To Video

Vidu Q3 Reference-to-Video Mix generates multi-entity consistent videos from 1-4 reference images with text prompt guidance. Supports 360p to 1080p resolutions, up to 16 seconds duration, multiple aspect ratios, and optional audio generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.35
Image To Video Pro — Vidu Q3 image-to-video preview from Shengshu

Image To Video Pro

Vidu Q3 Image-to-Video Pro generates high-resolution videos (720p/1080p/2K/4K) from images with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.45
Image To Video — Vidu Q3 image-to-video preview from Shengshu

Image To Video

Vidu Q3 Pro Image-to-Video animates still images with high-quality motion via viduq3-pro (1–16s). Billing follows Vidu's published Q3-pro per-second rates by resolution. Ready-to-use REST inference API on WaveSpeed.

image-to-videofrom $0.25
Image To Video — Vidu Q3 image-to-video preview from Shengshu

Image To Video

Vidu Q3 Turbo Image-to-Video animates static images with high-quality motion and faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.30
Text To Video — Vidu Q3 text-to-video preview from Shengshu

Text To Video

Vidu Q3 Text-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-videofrom $0.35
Image To Video — Vidu Q3 image-to-video preview from Shengshu

Image To Video

Vidu Q3 Image-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-videofrom $0.35

See Vidu Q3 in action

Real outputs generated by the Vidu Q3 API. Hover any video to preview, click to open the full-size viewer.

How to use the Vidu Q3 API

Four steps from signup to a finished generation. Full Python, Node.js, and cURL examples are in the API section below.

  1. 1

    Get an API key

    Sign up for a WaveSpeedAI account and copy your API key from the dashboard. New accounts come with free starter credits — enough to run the playground a few dozen times before billing kicks in.

  2. 2

    Submit a prediction

    POST your input as JSON to https://api.wavespeed.ai/api/v3/vidu/q3/text-to-video. The endpoint returns a prediction id immediately — generations are async so you don't hold an open connection during inference.

  3. 3

    Poll for completion

    GET https://api.wavespeed.ai/api/v3/predictions/{request_id}/result every 1-2 seconds. The response includes a status field; keep polling until it flips from "queued" or "processing" to "completed".

  4. 4

    Read the output URL

    Once status is "completed", read the URL from data.outputs[0]. The URL points to your generated media on the WaveSpeedAI CDN — image, video, audio, or 3D file depending on the Vidu Q3 variant you called.

What you can build with Vidu Q3

Common workflows developers and creators use the Vidu Q3 API for.

Multi-subject scene composition

Reference up to 7 subjects in a single shot — useful for ensemble cast scenes, group product demos, and storyboard panels with multiple consistent characters.

multi-subjectcastcomposition

Start-end frame interpolation

Supply a start frame and end frame, Vidu Q3 generates the connecting motion. Useful for stitching key poses, animatic-style storyboards, and lock-down delivery from concept frames.

interpolationkeyframesanimatic

Music videos with rhythm-aware motion

Vidu Q3 handles musical pacing and beat-aligned motion better than most general-purpose video models — useful for short music-video sequences and rhythm-driven creative work.

musicrhythmchoreography

Multi-reference product showcases

Combine multiple product reference images — Vidu Q3 generates scenes that consistently include all referenced items, useful for catalog-style demos.

productmulti-referencecatalog

Cinematic shorts with consistent cast

Ensemble narrative shorts where multiple characters need to remain recognizable across shots — Vidu Q3 holds identity across all referenced subjects.

narrativeensembleshort-film

Tips for prompting Vidu Q3

Practical advice for getting better outputs from Vidu Q3 — drawn from the patterns that work across video models in production pipelines.

Be specific about camera moves

Mention concrete cinematography vocabulary — orbit, dolly-in, push-in, pan-left, crane shot, handheld follow. Generic prompts produce static or arbitrary camera choices; named camera moves map directly to motion intent in the model's training data and dramatically improve shot quality.

Anchor character identity with reference images

If your prompt depends on a specific person, character, or product, upload a reference image alongside the prompt. Without a reference, identity drifts across frames and across shots — the same character ends up looking like a slightly different person each generation.

Describe lighting and time of day

Lighting cues like 'golden hour, soft warm directional light' or 'overcast diffused light, slate-grey sky' improve quality and consistency far more than vague quality modifiers. Lighting is one of the strongest priors the model conditions on.

Use negative prompts to suppress common failure modes

Useful negatives for video: 'frame flicker, motion blur, watermark, text artifacts, distorted hands, low resolution, jpeg compression'. Negative prompts cost nothing and noticeably reduce the rate of generations you'd otherwise re-roll.

Pick the shortest duration that captures your beat

Most prompts work best at 5-8 seconds. Longer clips amplify temporal inconsistencies (subject morphing, environment drift). If you need a 20-second sequence, generate three 6-8 second clips and edit them together — quality stays higher than one long generation.

Match aspect ratio to platform up front

9:16 for TikTok / Reels / Shorts, 16:9 for landscape feeds and YouTube, 1:1 for post grids. Models train slightly differently per aspect ratio — cropping a 16:9 to 9:16 after the fact loses both fidelity and the composition the model intended.

Vidu Q3 API pricing

Pricing is per-output. The final charge scales with the parameters you set in each variant's playground (resolution, duration, output count, references).

EndpointTypeStarting price
vidu/q3/image-to-video-spicyimage-to-video$0.35
vidu/q3-pro/text-to-videotext-to-video$0.25
vidu/q3-pro/start-end-to-videoimage-to-video$0.25
vidu/q3-turbo/start-end-to-videoimage-to-video$0.30
vidu/q3/start-end-to-videoimage-to-video$0.35
vidu/q3/reference-to-videoimage-to-video$0.35
vidu/q3/image-to-video-proimage-to-video$0.45
vidu/q3-pro/image-to-videoimage-to-video$0.25
vidu/q3-turbo/image-to-videoimage-to-video$0.30
vidu/q3/text-to-videotext-to-video$0.35
vidu/q3/image-to-videoimage-to-video$0.35

Call the Vidu Q3 API

Sign up for an API key at wavespeed.ai/accesskey, then submit a prediction via REST. The playground generates ready-to-paste samples for any combination of inputs.

HTTP example
# 1. Submit a prediction
curl -X POST "https://api.wavespeed.ai/api/v3/vidu/q3/text-to-video" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{}'

# 2. Poll the result until status = "completed"
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# Read the output URL from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY

const result = await client.run("vidu/q3/text-to-video", {});
console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "vidu/q3/text-to-video",
    {}
)
print(output["outputs"][0])  # → URL of the generated output

Vidu Q3 vs alternatives

When to pick Vidu Q3 over similar models on WaveSpeedAI.

Vidu Q3 vs Seedance 2.0

Seedance 2.0 ships native audio synthesis and stronger explicit camera-language control. Vidu Q3 wins on multi-subject composition (up to 7 subjects) and start-end frame interpolation, which Seedance doesn't support.

Vidu Q3 vs Kling 3.0

Kling 3.0 generates longer takes (up to 30s). Vidu Q3 wins on multi-reference composition and the start-end frame workflow — pick Kling for length, Vidu for multi-character control.

Vidu Q3 vs Wan 2.7

Wan 2.7 has multi-modal input including audio references and open weights. Vidu Q3 wins on multi-subject scenes and start-end interpolation — Wan for flexibility, Vidu for ensemble work.

Vidu Q3 API — Frequently asked questions

Pricing, license, integration — common questions about running Vidu Q3 on WaveSpeedAI.

What is the Vidu Q3 API?

Vidu Q3 is a Shengshu video generation model exposed as a REST API on WaveSpeedAI. Shengshu's Vidu Q3 — cinematic-quality video generation with text-to-video, image-to-video, reference-to-video (up to 7 subjects in one shot), and start-end frame interpolation. Three tiers: Standard, Pro, Turbo. You can call it programmatically or try it from the playground linked above.

How do I call the Vidu Q3 API?

Sign up for a WaveSpeedAI account, copy your API key from /accesskey, then POST to https://api.wavespeed.ai/api/v3/vidu/q3/text-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to "completed", then read the output URL from data.outputs[0]. Full Python / Node.js / cURL examples are above.

How much does the Vidu Q3 API cost?

Vidu Q3 starts at $0.25 per run. The exact cost scales with the parameters you set (resolution, duration, output count, references). The live cost preview next to the Generate button in the playground shows the exact price for your current input.

Which Vidu Q3 variants are available?

WaveSpeedAI hosts 11 Vidu Q3 endpoints: vidu/q3/image-to-video-spicy, vidu/q3-pro/text-to-video, vidu/q3-pro/start-end-to-video, vidu/q3-turbo/start-end-to-video, vidu/q3/start-end-to-video, vidu/q3/reference-to-video, vidu/q3/image-to-video-pro, vidu/q3-pro/image-to-video, and more. Each variant has its own playground page and pricing.

Can I use Vidu Q3 outputs commercially?

Commercial usage rights follow the Shengshu model license. Most Shengshu models permit commercial output use; see each model's playground page for the specific license summary, and WaveSpeedAI's Terms of Service for platform-level conditions.

Why use Vidu Q3 on WaveSpeedAI instead of going direct?

One API key + one billing account across Vidu Q3 AND 1,000+ other AI models from other providers. No per-vendor SDK setup, no separate rate-limit envelopes, no rewrite-per-vendor integration code. Pricing is typically at parity with or below Shengshu's direct API.

About Shengshu

The team behind Vidu Q3 and the broader Shengshu model lineup on WaveSpeedAI.

Shengshu Technology is a Chinese AI lab spun out of Tsinghua University, behind the Vidu family of video generation models. Vidu was an early mover on multi-reference inputs (up to seven subjects composed into a single shot), strong cross-cut character consistency, and start-end frame interpolation that takes two stills and generates the connecting motion. The Q3 generation ships in Standard, Pro, and Turbo tiers to span the cost/quality range.

Start building with Vidu Q3 on WaveSpeedAI

Free starter credits on signup. One API key across 1,000+ AI models from Shengshu and every other provider.