Music Video Generator
Playground
Try it on WavespeedAI!AI Music Video Generator transforms audio + a single photo into a full music video with cinematic camera angles, smooth transitions, and perfect lip sync. Up to 10 minutes, 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Features
AI Music Video (MV) Generator
The world’s best AI music video (MV) generator. Turn any song + a single photo into a professional-quality music video in minutes.
Why It’s the Best
- Blazing fast: Generate a full 1-minute music video in just a few minutes. No waiting hours.
- Perfect lip sync: Vocal-aware segmentation ensures the singer’s lips match the audio precisely throughout the entire video.
- Cinematic quality: AI director plans each scene with different camera angles, compositions, and natural lighting — like a real music video shoot.
- One photo is all you need: Upload a single portrait and the AI handles the rest — scene creation, angle variations, and smooth transitions.
- Up to 10 minutes: Create full-length music videos, not just short clips.
- Smart scene planning: Automatically detects vocal phrases and silence in the audio to create natural scene transitions at musically meaningful moments.
How It Works
- Upload your audio — any song, any genre, up to 10 minutes.
- Upload 1-3 reference images (optional) — the person who will appear in the video.
- Describe the scene (optional) — e.g. “A woman sings in a forest while playing a guitar”.
- Choose aspect ratio — 16:9 (landscape) or 9:16 (portrait/vertical).
- Select resolution — 480p or 720p.
- Get your music video — fully rendered with transitions, multiple angles, and synced audio.
What Happens Behind the Scenes
- Vocal isolation — Separates vocals from instruments to analyze singing patterns.
- Smart segmentation — Splits the audio at natural phrase boundaries (not arbitrary fixed intervals).
- AI directing — A vision-language model plans each scene: camera angles, compositions, expressions, and camera movements.
- Scene generation — Creates unique starting frames for each segment from different angles.
- Video synthesis — Generates lip-synced digital human video for each segment.
- Cinematic assembly — Smooth crossfade transitions between scenes, with the original audio layered on top for perfect sync.
Pricing
| Output Resolution | Cost per 5 seconds | Max Length |
|---|---|---|
| 480p | $0.15 | 10 minutes |
| 720p | $0.30 | 10 minutes |
Billing Rules
- Standard Rate: $0.03 per second
- HD (720p) Rate: $0.06 per second
- Minimum Charge: 5 seconds ($0.15 minimum)
- Billing Cap: 600 seconds (10 minutes)
Parameters
| Parameter | Required | Description |
|---|---|---|
audio | Yes | URL of the audio/music file |
images | No | Array of 1-3 reference image URLs |
prompt | No | Scene/style description |
aspect_ratio | No | ”16:9” or “9:16” (auto if omitted) |
resolution | No | ”480p” (default) or “720p” |
Tips
- Best results with vocals: The AI uses vocal patterns for scene timing. Songs with clear vocals produce the best-timed transitions.
- Portrait photos work best: Clear, front-facing photos with visible face give the best identity preservation.
- Be descriptive: A good prompt like “A rock singer performing on a neon-lit stage” gives much better results than just “singer”.
- No photo? No problem: If you don’t provide images, the AI will generate a performer based on the detected voice (male/female).
Note
- Max audio length: 10 minutes (600 seconds)
- Processing speed: A 1-minute music video typically completes in 3-6 minutes
- Supported audio formats: MP3, WAV, AAC, and most common formats
- The AI automatically handles scene planning, you don’t need to specify individual scenes
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/music-video-generator" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"resolution": "480p"
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| audio | string | Yes | - | - | The audio/music file URL for generating the music video. |
| images | array | No | [] | - | List of reference image URLs (1-3 images). The person in the images will appear throughout the video. |
| prompt | string | No | - | Style and scene description for the music video (e.g. "A woman sings in a forest while playing a guitar"). | |
| aspect_ratio | string | No | - | 16:9, 9:16 | Aspect ratio of the output video. If not specified, auto-detected from input images. |
| resolution | string | No | 480p | 480p, 720p | The resolution of the output video. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |