Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
Home/Explore/character-ai/ovi/image-to-video
image-to-video

image-to-video

Character-AI Ovi

character-ai/ovi/image-to-video

Ovi is a Veo-3-like image-to-video model that generates synchronized video and audio from text or text+image prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Drag & drop or click to upload

preview

Idle

Your request will cost $0.15 per run.

For $10 you can run this model approximately 66 times.

One more thing:

ExamplesView all

README

Ovi (I2V Version)

Ovi is a veo-3 like, image-to-audio-video (I2AV) generation model that creates synchronized video and audio from a single image plus a descriptive text prompt.

It is designed for short-form storytelling, where a still image is brought to life with cinematic motion, dialogue, and sound.

🌟 Key Features

  • šŸŽ¬ Image → Video+Audio – Bring a static image to life with synchronized audiovisual output.
  • šŸ“ Prompt-driven – Use text prompts to control scene dynamics, style, and audio.
  • šŸ—£ļø Speech & Sound – Insert dialogue or sound effects using special tags.
  • ā±ļø Short-form Output – Generates 5-second clips at 24 FPS.

šŸ’² Pricing

Video LengthCost
5 seconds$0.15

Billing Rules

  • Minimum charge: 5 seconds

šŸŽØ How to Use

  1. Upload Image
  • Provide a reference image as the base frame.
  • Make sure the URL is valid and accessible (a preview should appear).
  1. Enter Prompt
  • Describe scene motion, style, and atmosphere.

  • Use tags for sound:

  • <S>... <E> → Speech (converted into spoken audio)

  • <AUDCAP>... <ENDAUDCAP> → Background audio / effects

  1. Set Seed
  • -1 = random output
  • Any fixed number = reproducible results
  1. Run
  • Click Run $0.15 to generate your 5s image-to-audio-video clip.
  • Preview and download the result.

šŸ“ Prompt Example

A wide shot of a medieval knight standing in the rain, sword planted into the ground, glowing with mystical energy. 
<S>I will defend this land until my last breath.<E> 
<AUDCAP>Thunder rolls across the dark sky, distant war drums echo.<ENDAUDCAP>

šŸ™ Acknowledgements

  • Wan2.2 – Video backbone initialization
  • MMAudio – Audio encoder/decoder inspiration

⭐ Citation

If Ovi is useful, please ⭐ the repo and cite the paper:

@misc{low2025ovitwinbackbonecrossmodal,
 title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation}, 
 author={Chetwin Low and Weimin Wang and Calder Katyal},
 year={2025},
 eprint={2510.01284},
 archivePrefix={arXiv},
 primaryClass={cs.MM},
 url={https://arxiv.org/abs/2510.01284}, 
}