Hunyuan Video Foley | AI Video Dubbing API

Home/Explore/WaveSpeed/Hunyuan Video Foley

wavespeed-ai /

HunyuanVideo-Foley generates realistic Foley and ambient audio from an uploaded video using a text prompt to describe desired sounds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-dubbing

Input

Enable Safety Checker

Idle

$0.05per run·~20 / $1

ExamplesView all

The sound of water splashing when a tiger jumps into the water.

Generate the sound of an acrobat performing acrobatics.

Generate a piece of tranquil music.

Essa é uma excelente comparação, pois C, Python e JavaScript representam três eras e abordagens muito diferentes da programação. Vamos explorar o que fundamentalmente as distingue. Aqui está um resumo das diferenças essenciais: Característica Principal Linguagem C Python JavaScript (JS) Nível de Abstração Baixo (Muito próximo do hardware) Alto (Linguagem humana) Alto (Linguagem humana) Tipo de Execução Compilada (Rápida, código binário) Interpretada (Lida e executada linha por linha) Interpretada/JIT (Usada principalmente em browsers) Gerenciamento de Memória Manual (Ponteiros e malloc/free) Automático (Garbage Collection) Automático (Garbage Collection) Velocidade Muito Rápida (Benchmark de performance) Lenta (Prioriza facilidade) Moderada a Rápida (Melhorias com JIT) Uso Típico Sistemas Operacionais, Drivers, Embedded Systems, Jogos. Ciência de Dados, Web Backend, Automação, Scripts. Desenvolvimento Web (Frontend e Backend - Node.js). A diferença mais profunda está no nível de controle que você tem. C dá a você a chave do carro, o mapa e o acesso direto ao motor; Python e JavaScript são carros com piloto automático. Para dominar essa diferença, qual tópico você gostaria de explorar mais a fundo? 🧠 Memória e Ponteiros: Entender por que C é tão rápido, focando no gerenciamento manual de memória (ponteiros) versus o gerenciamento automático (Garbage Collection) do Python e JS. 💻 Compilação vs. Interpretação: Como cada linguagem é transformada em algo que o computador entende e como isso afeta a velocidade e a portabilidade. 🌐 Onde Elas Brilham (Domínios de Uso): Explorar os nichos de mercado e os tipos de projetos onde cada linguagem é indispensável (hardware, web, dados, etc.). 2.5 Flash O Gemini pode cometer erros. Por isso, é bom checar as respostas.

Related Models

hunyuan-image-3

text-to-image

hunyuan-3d-v3.1/image-to-3d-rapid

image-to-3d

hunyuan-3d-v3.1/text-to-3d-rapid

text-to-3d

hunyuan3d-v3/text-to-3d

text-to-3d

hunyuan3d-v3/image-to-3d

image-to-3d

hunyuan3d-v3/sketch-to-3d

image-to-3d

README

HunyuanVideo-Foley

What is HunyuanVideo-Foley?

HunyuanVideo-Foley is Tencent Hunyuan's video-to-audio model that synthesizes realistic Foley and ambient sound directly from video. It aligns on-screen actions and scene context to produce timing-accurate, high-quality audio tracks.

Why this?

Traditional audio generators struggle with generalization, semantic alignment, and clean quality. HunyuanVideo-Foley addresses these pain points head-on.

What it can do

Multi-scene synchronization – High-quality audio aligned to complex, fast-cut visuals.
Multi-modal balance – Blends visual cues with optional text prompts for intent-aware sound.
48 kHz hi-fi output – Professional clarity with low noise and artifacts.
SOTA performance – Leading results in fidelity, sync, and semantic alignment benchmarks.

From short clips to cinematic cuts

Whether you’re polishing a social clip or finishing an animated short, HunyuanVideo-Foley can help with you.

Example (ASMR):

Silent video description: close-up of hands slicing fresh kiwi on a wooden board; crisp macro textures; soft natural light.
Text prompt: Generate realistic kiwi cutting and peeling sounds; gentle tapping; calm ASMR ambience.

Designed for

Post & Studios – Fast Foley passes for animatics, rough cuts, and indie films.
Creators & Social Teams – Auto-sound shorts/reels with consistent timing.
Education & Prototyping – Demonstrate AV alignment or test sound design ideas quickly.

How to Use (HunyuanVideo-Foley)

Upload video (required) – Add the silent (or low-sound) clip you want to sound.
Write prompt (optional) – Briefly describe the mood or key sounds, e.g.

Rainy street ambience, soft footsteps, distant cars.
Kitchen ASMR: chopping vegetables, sizzling pan.

Set seed – use a fixed number to reproduce the same result; change it for variants.
Run – Click Run (the button shows the cost).
Review & iterate – If timing or tone isn't right, tweak the prompt or seed and run again.

Accessibility:This website uses AI models provided by third parties.

Hunyuan Video Foley API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/hunyuan-video-foley with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Hunyuan Video Foley below.

HTTP example

# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/hunyuan-video-foley" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "video": "https://example.com/your-input.mp4",
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "seed": -1
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].

Node.js example

// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("wavespeed-ai/hunyuan-video-foley", {
        "video": "https://example.com/your-input.mp4",
        "prompt": "A cinematic shot of a city at sunset, soft golden light",
        "seed": -1
});

console.log(result.outputs[0]); // → URL of the generated output

Python example

# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "wavespeed-ai/hunyuan-video-foley",
    {
    "video": "https://example.com/your-input.mp4",
    "prompt": "A cinematic shot of a city at sunset, soft golden light",
    "seed": -1
}
)

print(output["outputs"][0])  # → URL of the generated output

Hunyuan Video Foley API — Frequently asked questions

What is the Hunyuan Video Foley API?

Hunyuan Video Foley is a WaveSpeedAI model for AI inference, exposed as a REST API on WaveSpeedAI. HunyuanVideo-Foley generates realistic Foley and ambient audio from an uploaded video using a text prompt to describe desired sounds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Hunyuan Video Foley API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/hunyuan-video-foley.

How much does Hunyuan Video Foley cost per run?

Hunyuan Video Foley starts at $0.050 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Hunyuan Video Foley accept?

Key inputs: `prompt`, `video`, `seed`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/hunyuan-video-foley.

How long does Hunyuan Video Foley take to generate?

Average end-to-end generation time on WaveSpeedAI is around 29 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Hunyuan Video Foley outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.