Ovi Image to Video | Fast Image-to-Video API

Home/Explore/Character Ai/Ovi/Image To Video

character-ai /

Ovi is a Veo-3-like image-to-video model that generates synchronized video and audio from text or text+image prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

Input

Enable Safety Checker

Idle

$0.15per run·~66 / $10

ExamplesView all

Bright evenly lit laboratory room with metallic walls and soft white light reflections. A human man in a suit stands face-to-face with a humanoid robot, both in perfect focus. Camera: static medium close-up, centered framing, high exposure with clear details on both faces. Mood: tense, thoughtful, futuristic. <S>We built you to understand us.<E> A Sign <S>But sometimes I wonder if you understand us too well.<E> The robot tilts its head slightly, eyes glowing faint blue, voice calm and precise. <S>Understanding is not the same as becoming.<E> <AUDCAP>Soft ambient hum of electronics, faint mechanical servo sounds, two clear voices — human and synthetic, calm and steady<ENDAUDCAP>

A 5-second, dynamic close-up of a sleek, advanced android's head and upper torso. Its armored plates are etched with neon circuit patterns that pulse with a soft blue light. Its face is a polished metal and dark glass visor. As it boots up, its articulated jaw and vocal synthesizer move with precise, mechanical motion to form the words. Mood: Technological, mysterious, and immersive. <S>System. Online.<E> <AUDCAP>The clear, synthetic voice of the android, the low hum of its internal systems, and the faint, distant sound of hovering vehicles and city rain.<ENDAUDCAP>

A 5-second, static shot of a kind old Ghibli-style man with a wrinkled face and gentle eyes. He is seated at his workbench, holding a small wooden toy. He looks up and speaks softly to the viewer, his mouth moving clearly to form the words. The style is soft watercolor and pastel. Mood: Peaceful, wise, and nostalgic. <S>Just a little more...<E> <AUDCAP>The soft, raspy voice of the old man, the gentle sound of a breeze, and the distant chime of a wind bell.<ENDAUDCAP>

Raised his hand and said hello

A bearded man wearing large dark sunglasses and a blue patterned cardigan sits in a studio, actively speaking into a large, suspended microphone. He has headphones on and gestures with his hands, displaying rings on his fingers. Behind him, a wall is covered with red, textured sound-dampening foam on the left, and a white banner on the right features the "CHOICE FM" logo and various social media handles like "@ilovechoicefm" with "RALEIGH" below it. The man intently addresses the microphone, articulating, <S>is talent. It's all about authenticity. You gotta be who you really are, especially if you're working<E>. He leans forward slightly as he speaks, maintaining a serious expression behind his sunglasses.. <AUDCAP>Clear male voice speaking into a microphone, a low background hum.<ENDAUDCAP>

A medium close-up of a young woman standing on a sun-drenched hilltop at golden hour. A gentle breeze blows through her hair. She is turning towards the camera with a radiant, genuine smile, her mouth perfectly formed in the middle of the word "beautiful". Her eyes are squinting slightly against the low sun, filled with contentment. Camera: Static shot, sharp focus on her face and mouth. Bright, natural daylight, high exposure with soft shadows that define facial features. Mood: Serene, joyful, cinematic realism. <S>What a beautiful day!<E> <AUDCAP>The gentle rustle of leaves in the wind, distant chirping of birds, her clear and happy voice, a faint sigh of contentment after speaking.<ENDAUDCAP>

Related Models

kling-v3-turbo-std/text-to-video

text-to-video

kling-v3-turbo-pro/text-to-video

text-to-video

gemini-omni-flash/text-to-video

text-to-video

ray-3.2/text-to-video

text-to-video

seedream-v5.0-pro

text-to-image

seedream-v5.0-pro/edit

image-to-image

README

Ovi (I2V Version)

Ovi is a veo-3 like, image-to-audio-video (I2AV) generation model that creates synchronized video and audio from a single image plus a descriptive text prompt.

It is designed for short-form storytelling, where a still image is brought to life with cinematic motion, dialogue, and sound.

🌟 Key Features

🎬 Image → Video+Audio – Bring a static image to life with synchronized audiovisual output.
📝 Prompt-driven – Use text prompts to control scene dynamics, style, and audio.
🗣️ Speech & Sound – Insert dialogue or sound effects using special tags.
⏱️ Short-form Output – Generates 5-second clips at 24 FPS.

💲 Pricing

Video Length	Cost
5 seconds	$0.15

Billing Rules

Minimum charge: 5 seconds

🎨 How to Use

Upload Image

Provide a reference image as the base frame.
Make sure the URL is valid and accessible (a preview should appear).

Enter Prompt

Describe scene motion, style, and atmosphere.
Use tags for sound:
<S>... <E> → Speech (converted into spoken audio)
<AUDCAP>... <ENDAUDCAP> → Background audio / effects

Set Seed

-1 = random output
Any fixed number = reproducible results

Run

Click Run $0.15 to generate your 5s image-to-audio-video clip.
Preview and download the result.

📝 Prompt Example

A wide shot of a medieval knight standing in the rain, sword planted into the ground, glowing with mystical energy. 
<S>I will defend this land until my last breath.<E> 
<AUDCAP>Thunder rolls across the dark sky, distant war drums echo.<ENDAUDCAP>

🙏 Acknowledgements

Wan2.2 – Video backbone initialization
MMAudio – Audio encoder/decoder inspiration

⭐ Citation

If Ovi is useful, please ⭐ the repo and cite the paper:

@misc{low2025ovitwinbackbonecrossmodal,
 title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation}, 
 author={Chetwin Low and Weimin Wang and Calder Katyal},
 year={2025},
 eprint={2510.01284},
 archivePrefix={arXiv},
 primaryClass={cs.MM},
 url={https://arxiv.org/abs/2510.01284}, 
}

Accessibility:This website uses AI models provided by third parties.

ExamplesView all

Related Models

README

Ovi (I2V Version)

🌟 Key Features

💲 Pricing

🎨 How to Use

📝 Prompt Example

🙏 Acknowledgements

⭐ Citation

Ovi Image To Video API — Quick start

Ovi Image To Video API — Frequently asked questions