Pruna Ai P Video Avatar
Playground
Try it on WavespeedAI!Pruna AI P-Video Avatar is a fast AI avatar video generation model that creates high-quality avatar videos for digital humans, talking characters, social media content, marketing creatives, virtual presenters, and AI video workflows. Ready-to-use REST inference API with simple integration, no coldstarts, and affordable pricing.
Features
Pruna AI P-Video Avatar
Pruna AI P-Video Avatar generates a talking or performing avatar video from a reference image and an audio clip, with optional prompt guidance for motion and expression. It is designed for character-driven video generation where the image defines the avatar and the audio drives the timing and delivery.
Why Choose This?
-
Image + audio avatar generation Combine a reference image with an audio track to generate a video avatar performance.
-
Prompt-guided motion control Use
video_promptto nudge expression, movement, or overall performance style. -
Simple output settings Choose resolution and seed without a heavy configuration workflow.
-
Audio-driven timing Video length follows the uploaded audio duration, making it easier to generate synced outputs.
-
Production-ready workflow Useful for avatar clips, talking portraits, character presentations, and short-form content generation.
Parameters
| Parameter | Required | Description |
|---|---|---|
| image | Yes | Reference image used as the avatar source. |
| audio | Yes | Audio file used to drive the avatar video. |
| video_prompt | No | Optional prompt describing expression, motion, or overall video style. Keep it simple for better stability. |
| resolution | No | Output resolution: 720p or 1080p. |
| seed | No | Random seed for reproducibility. Use the same seed for more consistent results. |
How to Use
- Upload your image — provide the reference image you want to animate.
- Upload your audio — use a clear audio clip to drive the avatar performance.
- Add a simple video prompt (optional) — describe only the key motion or mood you want.
- Choose resolution — use
720pfor lower cost or1080pfor higher quality. - Set a seed (optional) — use a fixed seed for more reproducible outputs.
- Submit — run the model and download the generated avatar video.
Example Prompt
Natural head movement, subtle facial expression, stable identity, clean speaking performance, realistic motion
Pricing
Pricing is based on the audio duration and resolution.
720p
| Audio Duration | Cost |
|---|---|
| 5s | $0.125 |
| 10s | $0.25 |
| 30s | $0.75 |
| 60s | $1.50 |
| 600s | $15.00 |
1080p
| Audio Duration | Cost |
|---|---|
| 5s | $0.225 |
| 10s | $0.45 |
| 30s | $1.35 |
| 60s | $2.70 |
| 600s | $27.00 |
Billing Rules
- Pricing is based on the uploaded
audioduration - Minimum billed duration is 5 seconds
- Maximum billed duration is 600 seconds
720puses a base rate of $0.025 per second1080pcosts 1.8× the720pratevideo_promptandseeddo not affect pricing
Best Use Cases
- Talking avatar videos — Generate speaking portraits from a single image and audio track.
- Character presentation clips — Create short performance-based videos for storytelling or demos.
- Social media avatar content — Produce short avatar-driven clips for lightweight content workflows.
- Narration-driven character scenes — Pair a static character image with voice content for expressive video output.
- Prototype virtual presenters — Quickly test avatar-based presentation ideas without full animation workflows.
Pro Tips
- Keep the audio reasonably short for better reliability and easier iteration.
- Use a clear, front-facing image for better avatar stability.
- Keep
video_promptsimple and direct — overly detailed prompts are more likely to fail. - Focus the prompt on a few essentials, such as natural motion, subtle expression, or stable identity.
- Start with
720pfor testing, then switch to1080pfor final-quality outputs. - Reuse the same
seedwhen you want more consistent variations.
Notes
- Both
imageandaudioare required. - Very long audio is not recommended; shorter clips are easier to generate successfully.
- The model works best when
video_promptis simple rather than highly detailed. - Billing uses the audio duration, with a minimum of 5 seconds and a cap of 600 seconds.
save_audiois not exposed in the current input settings shown here.
Related Models
- Pruna AI P-Video Text-to-Video — Generate videos directly from natural-language prompts.
- Pruna AI P-Video Image-to-Video — Animate a reference image into a video clip with prompt guidance.
- Pruna AI P-Image Text-to-Image — Generate still images for image-first creative workflows.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/pruna-ai/p-video/avatar" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"video_prompt": "The person is talking.",
"resolution": "720p",
"seed": -1
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| image | string | Yes | - | Avatar image URL. | |
| audio | string | Yes | - | - | Audio URL used to drive the avatar speech and lip sync. |
| video_prompt | string | No | The person is talking. | - | Prompt controlling body movement, framing behavior, and atmosphere. |
| resolution | string | No | 720p | 720p, 1080p | Output resolution. |
| seed | integer | No | -1 | -1 ~ 2147483647 | Random seed for reproducible generations. |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content. |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |