HeyGen Avatar V Digital Twin is a fast AI avatar video generation model that creates natural digital twin videos from text or audio with lip-sync, optional captions, background removal, and MP4/WebM output. Ready-to-use REST inference API for digital humans, virtual presenters, product explainers, marketing videos, training content, social media clips, and professional avatar video workflows with simple integration, no coldstarts, and affordable pricing.
Idle
$0.12per run·~83 / $10
HeyGen Avatar IV Digital Twin generates a talking avatar video from a selected HeyGen digital twin avatar and an uploaded audio clip. It is designed for presenter videos, spokesperson content, personalized avatar delivery, and other avatar-driven speaking workflows with flexible output controls.
Digital twin avatar workflow Use a prebuilt digital twin avatar to generate speaking video from audio.
Audio-driven speech performance Upload an audio clip to drive the avatar’s timing, expression, and speaking delivery.
Flexible framing controls Choose aspect ratio, fit mode, and output resolution to match your target platform.
Optional background removal
Enable remove_background for supported matting-enabled avatars.
Optional caption export
Enable caption to generate a sidecar SRT subtitle file alongside the video.
Production-ready API Suitable for personalized presenter videos, internal communications, ads, explainers, and virtual spokesperson workflows.
| Parameter | Required | Description |
|---|---|---|
| avatar | Yes | Selected HeyGen digital twin avatar. |
| audio | Yes | Audio clip used to drive the avatar video. |
| fit | No | How the avatar is framed in the output, such as cover. |
| remove_background | No | Remove the avatar background. Requires a matting-enabled avatar. |
| caption | No | Generate a sidecar SRT caption file alongside the video. |
| output_format | No | Output video format. Default: mp4. |
| resolution | No | Output resolution, such as 720p. |
| aspect_ratio | No | Output aspect ratio, such as 16:9. |
fit, resolution, and aspect_ratio.remove_background or caption if needed.Generate a polished office presenter video from a digital twin avatar and a short voice clip for internal announcements or marketing content.
Pricing is based on the uploaded audio duration.
| Audio Duration | Cost |
|---|---|
| 5s | $0.60 |
| 6s | $0.72 |
| 7s | $0.84 |
| 8s | $0.96 |
| 10s | $1.20 |
| 15s | $1.80 |
caption when the final video needs accessible subtitles.remove_background if the selected avatar supports matting.aspect_ratio to your final platform, such as 16:9 for widescreen delivery.avatar and audio are required.remove_background requires a matting-enabled avatar.caption generates a sidecar SRT file, not burned-in subtitles.