Seedance 2.0 15 % DE DESCUENTO | Crea en el Video Generator →

Panel Explorar Generador IAHOT Aplicación de escritorio

LLM

Claves API Facturación

Configuración

Avatar Lipsync Models

WaveSpeedAI's AI Avatars delivers lifelike virtual characters with advanced lip sync and realistic expressions.

Nuestra selección

audio-to-video

wavespeed-ai/music-video-generator

AI Music Video Generator transforms audio + a single photo into a full music video with cinematic camera angles, smooth transitions, and perfect lip sync. Up to 10 minutes, 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

¡Pruébalo ahora!Ver documentación

Todos los modelos

41 modelos

audio-to-video

wavespeed-ai/music-video-generator

digital-human

wavespeed-ai/infinitetalk

InfiniteTalk converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk/video-to-video

Audio-driven InfiniteTalk turns one video plus audio into realistic talking or singing videos with lip-sync in 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast

InfiniteTalk fast converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk/multi

InfiniteTalk Multi converts a single image and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-lipsync/audio-to-video

Kling LipSync converts audio into talking head video by generating lifelike lip movements perfectly synced to the input audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-lipsync/text-to-video

Kling TextToVideo by Kwaivgi creates videos with lifelike lip movements that precisely sync to input text for natural speaking visuals. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v2-ai-avatar-standard

Kling AI Avatar generates high-quality AI avatar videos for profiles, intros, and social content, delivering clean detail and cinematic motion with reliable prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast/multi

InfiniteTalk fast multi converts a single image and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v2-ai-avatar-pro

Kling V2 AI Avatar Pro generates high-quality AI avatar videos with clean detail, stable motion, and strong identity consistency—ideal for profiles, intros, and social content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk/video-to-video-multi

InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast/video-to-video

Audio-driven infinitetalk-fast turns one video plus audio into realistic talking or singing videos with lip-sync. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v1-ai-avatar-standard

Kling AI Avatar produces stunning AI-generated video avatars for digital identity and content creation, with on-demand video billed at $0.25 per 5 seconds. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/infinitetalk-fast/video-to-video-multi

InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

kwaivgi/kling-v1-ai-avatar-pro

Kling AI Avatar Pro converts audio into talking video portraits; pricing is $1 for the first 5s then $0.20/s up to 600s. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

heygen/video-translate

HeyGen Video Translate: AI video translation into 70+ languages and 175+ dialects with no voice actors or dubbing. Fast, accurate, easy to use at $0.0375/sec. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

sync/lipsync-2-pro

Lipsync-2-pro creates studio-grade lip synchronization for video-to-video editing in minutes, not weeks. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

pixverse/lipsync

PixVerse LipSync converts audio into realistic lip-sync animations with advanced algorithms for precise mouth movements and timing for video avatars. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

bytedance/latentsync

LatentSync combines Stable Diffusion and TREPA for high-res end-to-end lip-sync, delivering precise, realistic mouth motions in generated videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/hunyuan-avatar

Hunyuan Avatar creates audio-driven talking or singing videos from one image + audio, in 480p/720p up to 120s (starts at $0.15/5s). Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

wavespeed-ai/multitalk

MultiTalk converts one image and audio into audio-driven talking/singing videos (Image-to-Video), supporting up to 10 minutes. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

motion-control

wavespeed-ai/wan-2.2/animate

Wan2.2-Animate unified character animation & replacement model replicating movement and expression; generates 720p videos up to 120s. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

veed/fabric-1.0

VEED Fabric 1.0 turns one image into dynamic, talking videos and AI avatars in 480p or 720p (starts at $0.35/5s 480p, $0.7/5s 720p). Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

wavespeed-ai/wan-2.1/mocha

MoCha performs Video-To-Video character swaps using reference images, replacing a video's character without per-frame pose or depth maps. Ready-to-use REST inference API, no coldstarts, affordable pricing.

digital-human

bytedance/avatar-omni-human-1.5

OmniHuman 1.5 converts audio and visual cues into lifelike avatar animations for virtual humans, storytelling, and interactive agents. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/lipsync-1.9.0-beta

Generate realistic lip-sync animations from audio using advanced algorithms for high-quality facial synchronization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/lipsync-2

Sync Lipsync-2 synchronizes lip movements in any video to supplied audio, enabling realistic mouth alignment for films, podcasts, games, or animations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/latentsync

LatentSync synchronizes video and audio inputs to generate seamless synchronized content. Perfect for lip-syncing, audio dubbing, and video-audio alignment tasks.

digital-human

veed/lipsync

Generate realistic lip-sync animations from audio with high-quality synchronization using Veed LipSync; $0.15 per 5s of video. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/wan-2.2/speech-to-video

Wan-2.2-S2V turns images and speech into high-fidelity videos with realistic face and body motion; supports up to 10-minute clips in 480p, from $0.15/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

motion-control

wavespeed-ai/steady-dancer

SteadyDancer is a 14B-parameter human image animation framework that transforms static images into coherent dance videos. Features first-frame preservation, robust identity consistency, and temporal coherence for realistic motion generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/react-1

Sync React-1 is a production-grade video-to-video lip-sync model. It maps any speech track to a target face, producing phoneme-accurate visemes and smooth timing while preserving identity, head pose, lighting, and background. Supports emotion and intensity control, multilingual speech, and long takes for talking-head content. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

digital-human

wavespeed-ai/longcat-avatar

LongCat Avatar produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 2 minutes, 720p tier $0.40/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/ltx-2-19b/lipsync

LTX-2 Lipsync generates synchronized talking head videos from a reference image and audio input. Powered by the 19B DiT architecture, it produces high-fidelity lip-synced videos with natural head movements. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/soulx-flashhead

SoulX FlashHead enables real-time streaming talking head video generation from portrait image and audio with ultra-fast 96 FPS performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/skyreels-v3/talking-avatar

SkyReels V3 Talking Avatar is a 19B-parameter multimodal model that generates talking avatars from portrait and audio with precise lip sync, supporting up to 20 seconds at 720p resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/ltx-2.3/lipsync

LTX-2.3 Lipsync generates talking head videos from audio with synchronized lip movements and natural facial expressions. Built on DiT-based architecture with improved audio-visual alignment quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/wan-2.1/multitalk

MultiTalk (WAN 2.1) is an audio-driven AI that turns a single image and audio into talking or singing conversational videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

bytedance/lipsync/audio-to-video

LipSync turns audio into lifelike talking videos by generating precise lip movements fully synced to input audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

sync/lipsync-3

Sync Lipsync 3 synchronizes lip movements in any video to supplied audio using zero-shot lip-sync technology. Supports multiple sync modes for handling duration mismatches, works with live-action, 3D characters, and AI-generated avatars. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

bytedance/avatar-omni-human

OmniHuman turns a single portrait photo into avatar video with lifelike motion and expressions ($0.12/sec). Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

API de Avatar Lipsync Models — precios y rendimiento

Ejecuta cualquier modelo de la colección Avatar Lipsync Models a través de una sola API REST. Paga por generación — sin suscripciones ni mínimos — con latencia líder del sector sobre una infraestructura con 99,9 % de disponibilidad.

Por qué ejecutar Avatar Lipsync Models en WaveSpeedAI

Precios transparentes

Precio por llamada para cada modelo Avatar Lipsync Models. El precio aparece en la página de cada modelo — sin recargos de plataforma.

Optimizado para baja latencia

La mayoría de los modelos de imagen Avatar Lipsync Models terminan en menos de 2 segundos. Los modelos de vídeo y 3D son varias veces más rápidos que las alternativas autoalojadas.

99,9 % de disponibilidad

Conmutación por error multirregión y reintentos automáticos mantienen tu tráfico de producción en línea — incluso durante caídas del proveedor.

Preguntas frecuentes

¿Cuánto cuesta la API de Avatar Lipsync Models?+

Cada modelo tiene su propio precio por llamada listado en su página. Cobramos por generación exitosa, sin cuotas de suscripción ni mínimos.

¿Qué tan rápidos son los modelos Avatar Lipsync Models en WaveSpeedAI?+

Los modelos de imagen de esta colección suelen completarse en menos de 2 segundos. Los modelos de vídeo y 3D dependen de la duración y la resolución, pero suelen ser varias veces más rápidos que las ejecuciones autoalojadas.

¿Puedo probar la API sin tarjeta de crédito?+

Sí — cada cuenta recibe $1 de crédito gratis al registrarse, suficiente para probar la mayoría de los modelos Avatar Lipsync Models sin tarjeta de crédito.

¿Hay límites de tasa?+

Las cuentas estándar tienen límites generosos de trabajos concurrentes. Los planes Enterprise ofrecen RPM personalizado, mayor concurrencia y capacidad dedicada — contacta con ventas para más detalles.

Explora más de 1.000 modelos de IA

Navega por nuestro catálogo completo de modelos de IA de última generación — imagen, vídeo, 3D, audio, LLM y más.

wavespeed.ai/models →

Construye con la API

Integra IA en tus propias aplicaciones. API RESTful con bibliotecas de cliente — sin arranques en frío, paga por uso.

wavespeed.ai/docs →