Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
Inicio/Explorar/character-ai/ovi/image-to-video
image-to-video

image-to-video

Character-AI Ovi

character-ai/ovi/image-to-video

Ovi is a Veo-3-like image-to-video model that generates synchronized video and audio from text or text+image prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Drag & drop or click to upload

preview

Idle

Tu solicitud costará $0.15 por ejecución.

Con $10 puedes ejecutar este modelo aproximadamente 66 veces.

Una cosa más:

EjemplosVer todo

README

Ovi (I2V Version)

Ovi is a veo-3 like, image-to-audio-video (I2AV) generation model that creates synchronized video and audio from a single image plus a descriptive text prompt.

It is designed for short-form storytelling, where a still image is brought to life with cinematic motion, dialogue, and sound.

🌟 Key Features

  • 🎬 Image → Video+Audio – Bring a static image to life with synchronized audiovisual output.
  • 📝 Prompt-driven – Use text prompts to control scene dynamics, style, and audio.
  • 🗣️ Speech & Sound – Insert dialogue or sound effects using special tags.
  • ⏱️ Short-form Output – Generates 5-second clips at 24 FPS.

💲 Pricing

Video LengthCost
5 seconds$0.15

Billing Rules

  • Minimum charge: 5 seconds

🎨 How to Use

  1. Upload Image
  • Provide a reference image as the base frame.
  • Make sure the URL is valid and accessible (a preview should appear).
  1. Enter Prompt
  • Describe scene motion, style, and atmosphere.

  • Use tags for sound:

  • <S>... <E> → Speech (converted into spoken audio)

  • <AUDCAP>... <ENDAUDCAP> → Background audio / effects

  1. Set Seed
  • -1 = random output
  • Any fixed number = reproducible results
  1. Run
  • Click Run $0.15 to generate your 5s image-to-audio-video clip.
  • Preview and download the result.

📝 Prompt Example

A wide shot of a medieval knight standing in the rain, sword planted into the ground, glowing with mystical energy. 
<S>I will defend this land until my last breath.<E> 
<AUDCAP>Thunder rolls across the dark sky, distant war drums echo.<ENDAUDCAP>

🙏 Acknowledgements

  • Wan2.2 – Video backbone initialization
  • MMAudio – Audio encoder/decoder inspiration

⭐ Citation

If Ovi is useful, please ⭐ the repo and cite the paper:

@misc{low2025ovitwinbackbonecrossmodal,
 title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation}, 
 author={Chetwin Low and Weimin Wang and Calder Katyal},
 year={2025},
 eprint={2510.01284},
 archivePrefix={arXiv},
 primaryClass={cs.MM},
 url={https://arxiv.org/abs/2510.01284}, 
}