Hunyuan Video Foley | AI Video Dubbing API

HunyuanVideo-Foley

What is HunyuanVideo-Foley?

HunyuanVideo-Foley is Tencent Hunyuan's video-to-audio model that synthesizes realistic Foley and ambient sound directly from video. It aligns on-screen actions and scene context to produce timing-accurate, high-quality audio tracks.

Why this?

Traditional audio generators struggle with generalization, semantic alignment, and clean quality. HunyuanVideo-Foley addresses these pain points head-on.

What it can do

Multi-scene synchronization – High-quality audio aligned to complex, fast-cut visuals.
Multi-modal balance – Blends visual cues with optional text prompts for intent-aware sound.
48 kHz hi-fi output – Professional clarity with low noise and artifacts.
SOTA performance – Leading results in fidelity, sync, and semantic alignment benchmarks.

From short clips to cinematic cuts

Whether you’re polishing a social clip or finishing an animated short, HunyuanVideo-Foley can help with you.

Example (ASMR):

Silent video description: close-up of hands slicing fresh kiwi on a wooden board; crisp macro textures; soft natural light.
Text prompt: Generate realistic kiwi cutting and peeling sounds; gentle tapping; calm ASMR ambience.

Designed for

Post & Studios – Fast Foley passes for animatics, rough cuts, and indie films.
Creators & Social Teams – Auto-sound shorts/reels with consistent timing.
Education & Prototyping – Demonstrate AV alignment or test sound design ideas quickly.

How to Use (HunyuanVideo-Foley)

Upload video (required) – Add the silent (or low-sound) clip you want to sound.
Write prompt (optional) – Briefly describe the mood or key sounds, e.g.

Rainy street ambience, soft footsteps, distant cars.
Kitchen ASMR: chopping vegetables, sizzling pan.

Set seed – use a fixed number to reproduce the same result; change it for variants.
Run – Click Run (the button shows the cost).
Review & iterate – If timing or tone isn't right, tweak the prompt or seed and run again.

Hunyuan Video Foley API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/hunyuan-video-foley with your input as JSON. The endpoint returns a prediction id. Start polling the result endpoint around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. On completed, read output values from data.outputs. Examples for Hunyuan Video Foley below.

HTTP example

set -euo pipefail

: "${WAVESPEED_API_KEY:?Set WAVESPEED_API_KEY}"

REQUEST_BODY=$(cat <<'JSON'
{
    "video": "https://interactive-examples.mdn.mozilla.net/media/cc0-videos/flower.mp4",
    "seed": -1
}
JSON
)

# 1. Submit the prediction.
SUBMIT_RESPONSE=$(curl --silent --show-error --fail-with-body \
  -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/hunyuan-video-foley" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d "$REQUEST_BODY")

TASK=$(printf '%s' "$SUBMIT_RESPONSE" | jq 'if has("data") then .data else . end')
PREDICTION_ID=$(printf '%s' "$TASK" | jq -r '.id')
if [ -z "$PREDICTION_ID" ] || [ "$PREDICTION_ID" = "null" ]; then
  printf 'Submission response did not contain a prediction id
' >&2
  exit 1
fi
RESULT_URL=$(printf '%s' "$TASK" | jq -r '.urls.get // empty')
if [ -z "$RESULT_URL" ]; then
  RESULT_URL="https://api.wavespeed.ai/api/v3/predictions/$PREDICTION_ID/result"
fi

# 2. Poll until the prediction finishes.
while true; do
  RESPONSE=$(curl --silent --show-error --fail-with-body "$RESULT_URL" \
    -H "Authorization: Bearer $WAVESPEED_API_KEY")
  RESULT=$(printf '%s' "$RESPONSE" | jq 'if has("data") then .data else . end')
  STATUS=$(printf '%s' "$RESULT" | jq -r '.status')
  case "$STATUS" in
    completed) printf '%s\n' "$RESULT" | jq '.outputs'; break ;;
    failed|cancelled|timeout) printf '%s\n' "$RESULT" | jq . >&2; exit 1 ;;
    created|processing) sleep 2 ;;
    *) printf 'Unexpected status: %s
' "$STATUS" >&2; exit 1 ;;
  esac
done

Node.js example

const submitUrl = "https://api.wavespeed.ai/api/v3/wavespeed-ai/hunyuan-video-foley";
const apiKey = process.env.WAVESPEED_API_KEY;
if (!apiKey) throw new Error('Set WAVESPEED_API_KEY');

async function requestJson(url, options = {}) {
  const response = await fetch(url, options);
  if (!response.ok) throw new Error(await response.text());
  return response.json();
}

// 1. Submit the prediction.
const body = await requestJson(submitUrl, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
        "video": "https://interactive-examples.mdn.mozilla.net/media/cc0-videos/flower.mp4",
        "seed": -1
}),
});
const task = body.data ?? body;
if (!task.id) throw new Error("Submission response did not contain a prediction id");
const resultUrl = task.urls?.get ||
  `https://api.wavespeed.ai/api/v3/predictions/${task.id}/result`;

// 2. Poll until the prediction finishes.
while (true) {
  const resultBody = await requestJson(resultUrl, {
    headers: { "Authorization": `Bearer ${apiKey}` },
  });
  const result = resultBody.data ?? resultBody;
  if (result.status === "completed") {
    console.log(result.outputs);
    break;
  }
  if (["failed", "cancelled", "timeout"].includes(result.status)) throw new Error(JSON.stringify(result));
  if (!["created", "processing"].includes(result.status)) throw new Error("Unexpected status: " + result.status);
  await new Promise(resolve => setTimeout(resolve, 2000));
}

Python example

import json
import os
import time
from urllib.request import Request, urlopen

api_key = os.environ["WAVESPEED_API_KEY"]
headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
payload = {
    "video": "https://interactive-examples.mdn.mozilla.net/media/cc0-videos/flower.mp4",
    "seed": -1
}

def request_json(url, data=None):
    request = Request(url, data=data, headers=headers, method="POST" if data else "GET")
    with urlopen(request) as response:
        return json.load(response)

# 1. Submit the prediction.
body = request_json("https://api.wavespeed.ai/api/v3/wavespeed-ai/hunyuan-video-foley", json.dumps(payload).encode())
task = body.get("data", body)
if not task.get("id"):
    raise RuntimeError("Submission response did not contain a prediction id")
result_url = task.get("urls", {}).get("get") or f"https://api.wavespeed.ai/api/v3/predictions/{task['id']}/result"

# 2. Poll until the prediction finishes.
while True:
    result_body = request_json(result_url)
    result = result_body.get("data", result_body)
    status = result.get("status")
    if status == "completed":
        print(result.get("outputs", []))
        break
    if status in {"failed", "cancelled", "timeout"}:
        raise RuntimeError(result)
    if status not in {"created", "processing"}:
        raise RuntimeError(f"Unexpected status: {status}")
    time.sleep(2)

Hunyuan Video Foley API — Frequently asked questions

What is the Hunyuan Video Foley API?

Hunyuan Video Foley is a WaveSpeedAI model for AI inference, exposed as a REST API on WaveSpeedAI. HunyuanVideo-Foley generates realistic Foley and ambient audio from an uploaded video using a text prompt to describe desired sounds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Hunyuan Video Foley API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID. Poll the result endpoint starting around every 2 seconds, increase the interval for long-running tasks, and stop on any terminal status. The playground generates production-oriented Python, JavaScript, and cURL examples with timeouts, transient-error handling, and safe GET retries. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/hunyuan-video-foley.

How much does Hunyuan Video Foley cost per run?

Hunyuan Video Foley starts at $0.050 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Hunyuan Video Foley accept?

Key inputs: `prompt`, `video`, `seed`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/hunyuan-video-foley.

How long does Hunyuan Video Foley take to generate?

Median end-to-end generation time on WaveSpeedAI is around 37 seconds per request, based on recent successful runs. Queue time varies with global demand; live status is visible in the prediction record.

Can I use Hunyuan Video Foley outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.

Ví dụXem tất cả

Mô hình liên quan

README