
Add music, voiceovers, and sound effects to your videos with WaveSpeedAI’s audio-for-video tools.

MMaudio v2 produces synchronized audio from video or text inputs, ideal for adding soundtracks to videos when paired with video models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

MMaudio v2 produces synchronized audio from video or text inputs, ideal for adding soundtracks to videos when paired with video models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Kling Video-to-Audio auto-generates or extracts matching sound effects and audio tracks from video using KlingAI's audio generation model. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Kling Text-to-Audio turns text prompts into custom sound effects for videos, games, and multimedia using KlingAI's audio model. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

HunyuanVideo-Foley generates realistic Foley and ambient audio from an uploaded video using a text prompt to describe desired sounds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

ACE-Step Prompt-to-Audio creates music from simple prompts, auto-generating genre tags and lyrics for quick song creation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Mirelo SFX V1.5 generates synchronized sound effects and audio for any video, producing synced SFX to enhance visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

ElevenLabs Dubbing automatically translates and dubs video/audio content into different languages while preserving the original speakers' voices. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Mirelo SFX V1 Video-to-Audio generates synchronized sound effects from video input with text prompt guidance. Supports multiple sample generation and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Run any model in the Audio for Video collection through a single REST API. Pay per generation — no subscriptions, no minimums — with industry-leading latency on a 99.9% uptime infrastructure.
Per-call pricing for every Audio for Video model. The price is listed on each model page — no platform fees on top.
Most Audio for Video image models complete in under 2 seconds. Video and 3D models run several times faster than self-hosted alternatives.
Multi-region failover and automatic retries keep your production traffic online — even during provider outages.
Each model has its own per-call price listed on the model page. We bill per successful generation, with no subscription fees or minimums.
Image models in this collection typically complete in under 2 seconds. Video and 3D models depend on duration and resolution but are usually several times faster than self-hosted runs.
Yes — every account gets $1 in free credits on signup, enough to try most Audio for Video models without a credit card.
Standard accounts have generous concurrent-job limits. Enterprise plans offer custom RPM, higher concurrency, and dedicated capacity — contact sales for details.