image-to-video
Idle
Your request will cost $0.15 per run.
For $10 you can run this model approximately 66 times.
One more thing:
Ovi is a veo-3 like, image-to-audio-video (I2AV) generation model that creates synchronized video and audio from a single image plus a descriptive text prompt.
It is designed for short-form storytelling, where a still image is brought to life with cinematic motion, dialogue, and sound.
Video Length | Cost |
---|---|
5 seconds | $0.15 |
Billing Rules
Upload Image
Enter Prompt
Describe scene motion, style, and atmosphere.
Use tags for sound:
<S> ... <E>
ā Speech (converted into spoken audio)<AUDCAP> ... <ENDAUDCAP>
ā Background audio / effectsSet Seed
-1
= random outputRun
A wide shot of a medieval knight standing in the rain, sword planted into the ground, glowing with mystical energy.
<S>I will defend this land until my last breath.<E>
<AUDCAP>Thunder rolls across the dark sky, distant war drums echo.<ENDAUDCAP>
If Ovi is useful, please ā the repo and cite the paper:
@misc{low2025ovitwinbackbonecrossmodal,
title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation},
author={Chetwin Low and Weimin Wang and Calder Katyal},
year={2025},
eprint={2510.01284},
archivePrefix={arXiv},
primaryClass={cs.MM},
url={https://arxiv.org/abs/2510.01284},
}