Ovi is a Veo-3-like image-to-video model that generates synchronized video and audio from text or text+image prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Inactivo
$0.15por ejecución·~66 / $10
Bright evenly lit laboratory room with metallic walls and soft white light reflections. A human man in a suit stands face-to-face with a humanoid robot, both in perfect focus. Camera: static medium close-up, centered framing, high exposure with clear details on both faces. Mood: tense, thoughtful, futuristic. <S>We built you to understand us.<E> A Sign <S>But sometimes I wonder if you understand us too well.<E> The robot tilts its head slightly, eyes glowing faint blue, voice calm and precise. <S>Understanding is not the same as becoming.<E> <AUDCAP>Soft ambient hum of electronics, faint mechanical servo sounds, two clear voices — human and synthetic, calm and steady<ENDAUDCAP>
A 5-second, dynamic close-up of a sleek, advanced android's head and upper torso. Its armored plates are etched with neon circuit patterns that pulse with a soft blue light. Its face is a polished metal and dark glass visor. As it boots up, its articulated jaw and vocal synthesizer move with precise, mechanical motion to form the words. Mood: Technological, mysterious, and immersive. <S>System. Online.<E> <AUDCAP>The clear, synthetic voice of the android, the low hum of its internal systems, and the faint, distant sound of hovering vehicles and city rain.<ENDAUDCAP>
A 5-second, static shot of a kind old Ghibli-style man with a wrinkled face and gentle eyes. He is seated at his workbench, holding a small wooden toy. He looks up and speaks softly to the viewer, his mouth moving clearly to form the words. The style is soft watercolor and pastel. Mood: Peaceful, wise, and nostalgic. <S>Just a little more...<E> <AUDCAP>The soft, raspy voice of the old man, the gentle sound of a breeze, and the distant chime of a wind bell.<ENDAUDCAP>
Raised his hand and said hello
A bearded man wearing large dark sunglasses and a blue patterned cardigan sits in a studio, actively speaking into a large, suspended microphone. He has headphones on and gestures with his hands, displaying rings on his fingers. Behind him, a wall is covered with red, textured sound-dampening foam on the left, and a white banner on the right features the "CHOICE FM" logo and various social media handles like "@ilovechoicefm" with "RALEIGH" below it. The man intently addresses the microphone, articulating, <S>is talent. It's all about authenticity. You gotta be who you really are, especially if you're working<E>. He leans forward slightly as he speaks, maintaining a serious expression behind his sunglasses.. <AUDCAP>Clear male voice speaking into a microphone, a low background hum.<ENDAUDCAP>
A medium close-up of a young woman standing on a sun-drenched hilltop at golden hour. A gentle breeze blows through her hair. She is turning towards the camera with a radiant, genuine smile, her mouth perfectly formed in the middle of the word "beautiful". Her eyes are squinting slightly against the low sun, filled with contentment. Camera: Static shot, sharp focus on her face and mouth. Bright, natural daylight, high exposure with soft shadows that define facial features. Mood: Serene, joyful, cinematic realism. <S>What a beautiful day!<E> <AUDCAP>The gentle rustle of leaves in the wind, distant chirping of birds, her clear and happy voice, a faint sigh of contentment after speaking.<ENDAUDCAP>
Ovi is a veo-3 like, image-to-audio-video (I2AV) generation model that creates synchronized video and audio from a single image plus a descriptive text prompt.
It is designed for short-form storytelling, where a still image is brought to life with cinematic motion, dialogue, and sound.
| Video Length | Cost |
|---|---|
| 5 seconds | $0.15 |
Billing Rules
Describe scene motion, style, and atmosphere.
Use tags for sound:
<S>... <E> → Speech (converted into spoken audio)
<AUDCAP>... <ENDAUDCAP> → Background audio / effects
-1 = random outputA wide shot of a medieval knight standing in the rain, sword planted into the ground, glowing with mystical energy.
<S>I will defend this land until my last breath.<E>
<AUDCAP>Thunder rolls across the dark sky, distant war drums echo.<ENDAUDCAP>
If Ovi is useful, please ⭐ the repo and cite the paper:
@misc{low2025ovitwinbackbonecrossmodal,
title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation},
author={Chetwin Low and Weimin Wang and Calder Katyal},
year={2025},
eprint={2510.01284},
archivePrefix={arXiv},
primaryClass={cs.MM},
url={https://arxiv.org/abs/2510.01284},
}
Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/character-ai/ovi/image-to-video with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Ovi Image To Video below.
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/character-ai/ovi/image-to-video" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $WAVESPEED_API_KEY" \
-d '{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"seed": -1
}'
# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
-H "Authorization: Bearer $WAVESPEED_API_KEY"
# When status is "completed", read the output from data.outputs[0].// npm install wavespeed
const WaveSpeed = require('wavespeed');
const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env
const result = await client.run("character-ai/ovi/image-to-video", {
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"seed": -1
});
console.log(result.outputs[0]); // → URL of the generated output# pip install wavespeed
import wavespeed
output = wavespeed.run(
"character-ai/ovi/image-to-video",
{
"prompt": "A cinematic shot of a city at sunset, soft golden light",
"image": "https://example.com/your-input.jpg",
"seed": -1
}
)
print(output["outputs"][0]) # → URL of the generated outputOvi Image To Video is a Character Ai model for video generation from images, exposed as a REST API on WaveSpeedAI. Ovi is a Veo-3-like image-to-video model that generates synchronized video and audio from text or text+image prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.
POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/character-ai/character-ai-ovi-image-to-video.
Ovi Image To Video starts at $0.15 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.
Key inputs: `prompt`, `image`, `seed`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/character-ai/character-ai-ovi-image-to-video.
Average end-to-end generation time on WaveSpeedAI is around 68 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.
Commercial usage rights depend on the model's license, set by its provider (Character Ai). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.