Browse ModelsWavespeed AIMusic Video Generator

Music Video Generator

Music Video Generator

Playground

Try it on WavespeedAI!

AI Music Video Generator transforms audio + a single photo into a full music video with cinematic camera angles, smooth transitions, and perfect lip sync. Up to 10 minutes, 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

AI Music Video (MV) Generator

The world’s best AI music video (MV) generator. Turn any song + a single photo into a professional-quality music video in minutes.

Why It’s the Best

  • Blazing fast: Generate a full 1-minute music video in just a few minutes. No waiting hours.
  • Perfect lip sync: Vocal-aware segmentation ensures the singer’s lips match the audio precisely throughout the entire video.
  • Cinematic quality: AI director plans each scene with different camera angles, compositions, and natural lighting — like a real music video shoot.
  • One photo is all you need: Upload a single portrait and the AI handles the rest — scene creation, angle variations, and smooth transitions.
  • Up to 10 minutes: Create full-length music videos, not just short clips.
  • Smart scene planning: Automatically detects vocal phrases and silence in the audio to create natural scene transitions at musically meaningful moments.

How It Works

  1. Upload your audio — any song, any genre, up to 10 minutes.
  2. Upload 1-3 reference images (optional) — the person who will appear in the video.
  3. Describe the scene (optional) — e.g. “A woman sings in a forest while playing a guitar”.
  4. Choose aspect ratio — 16:9 (landscape) or 9:16 (portrait/vertical).
  5. Select resolution — 480p or 720p.
  6. Get your music video — fully rendered with transitions, multiple angles, and synced audio.

What Happens Behind the Scenes

  1. Vocal isolation — Separates vocals from instruments to analyze singing patterns.
  2. Smart segmentation — Splits the audio at natural phrase boundaries (not arbitrary fixed intervals).
  3. AI directing — A vision-language model plans each scene: camera angles, compositions, expressions, and camera movements.
  4. Scene generation — Creates unique starting frames for each segment from different angles.
  5. Video synthesis — Generates lip-synced digital human video for each segment.
  6. Cinematic assembly — Smooth crossfade transitions between scenes, with the original audio layered on top for perfect sync.

Pricing

Output ResolutionCost per 5 secondsMax Length
480p$0.1510 minutes
720p$0.3010 minutes

Billing Rules

  • Standard Rate: $0.03 per second
  • HD (720p) Rate: $0.06 per second
  • Minimum Charge: 5 seconds ($0.15 minimum)
  • Billing Cap: 600 seconds (10 minutes)

Parameters

ParameterRequiredDescription
audioYesURL of the audio/music file
imagesNoArray of 1-3 reference image URLs
promptNoScene/style description
aspect_ratioNo”16:9” or “9:16” (auto if omitted)
resolutionNo”480p” (default) or “720p”

Tips

  • Best results with vocals: The AI uses vocal patterns for scene timing. Songs with clear vocals produce the best-timed transitions.
  • Portrait photos work best: Clear, front-facing photos with visible face give the best identity preservation.
  • Be descriptive: A good prompt like “A rock singer performing on a neon-lit stage” gives much better results than just “singer”.
  • No photo? No problem: If you don’t provide images, the AI will generate a performer based on the detected voice (male/female).

Note

  • Max audio length: 10 minutes (600 seconds)
  • Processing speed: A 1-minute music video typically completes in 3-6 minutes
  • Supported audio formats: MP3, WAV, AAC, and most common formats
  • The AI automatically handles scene planning, you don’t need to specify individual scenes

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/music-video-generator" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "resolution": "480p"
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
audiostringYes--The audio/music file URL for generating the music video.
imagesarrayNo[]-List of reference image URLs (1-3 images). The person in the images will appear throughout the video.
promptstringNo-Style and scene description for the music video (e.g. "A woman sings in a forest while playing a guitar").
aspect_ratiostringNo-16:9, 9:16Aspect ratio of the output video. If not specified, auto-detected from input images.
resolutionstringNo480p480p, 720pThe resolution of the output video.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

ParameterTypeRequiredDefaultDescription
idstringYes-Task ID

Result Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
dataobjectThe prediction data object containing all details
data.idstringUnique identifier for the prediction, the ID of the prediction to get
data.modelstringModel ID used for the prediction
data.outputsstringArray of URLs to the generated content (empty when status is not completed).
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds
© 2025 WaveSpeedAI. All rights reserved.