Home/Explore/character-ai/ovi/image-to-video

image-to-video

character-ai/ovi/image-to-video

Ovi is a veo-3 like, video+audio generation model that simultaneously generates both video and audio content from text or text+image inputs.

Hint: You can drag and drop a file or click to upload

preview

Idle

Your request will cost $0.15 per run.

For $10 you can run this model approximately 66 times.

One more thing:

ExamplesView all

README

Ovi (I2V Version)

Ovi is a veo-3 like, image-to-audio-video (I2AV) generation model that creates synchronized video and audio from a single image plus a descriptive text prompt.

It is designed for short-form storytelling, where a still image is brought to life with cinematic motion, dialogue, and sound.

🌟 Key Features

  • šŸŽ¬ Image → Video+Audio – Bring a static image to life with synchronized audiovisual output.
  • šŸ“ Prompt-driven – Use text prompts to control scene dynamics, style, and audio.
  • šŸ—£ļø Speech & Sound – Insert dialogue or sound effects using special tags.
  • ā±ļø Short-form Output – Generates 5-second clips at 24 FPS.

šŸ’² Pricing

Video LengthCost
5 seconds$0.15

Billing Rules

  • Minimum charge: 5 seconds

šŸŽØ How to Use

  1. Upload Image

    • Provide a reference image as the base frame.
    • Make sure the URL is valid and accessible (a preview should appear).
  2. Enter Prompt

    • Describe scene motion, style, and atmosphere.

    • Use tags for sound:

      • <S> ... <E> → Speech (converted into spoken audio)
      • <AUDCAP> ... <ENDAUDCAP> → Background audio / effects
  3. Set Seed

    • -1 = random output
    • Any fixed number = reproducible results
  4. Run

    • Click Run $0.15 to generate your 5s image-to-audio-video clip.
    • Preview and download the result.

šŸ“ Prompt Example

A wide shot of a medieval knight standing in the rain, sword planted into the ground, glowing with mystical energy.  
<S>I will defend this land until my last breath.<E>  
<AUDCAP>Thunder rolls across the dark sky, distant war drums echo.<ENDAUDCAP>

šŸ™ Acknowledgements

  • Wan2.2 – Video backbone initialization
  • MMAudio – Audio encoder/decoder inspiration

⭐ Citation

If Ovi is useful, please ⭐ the repo and cite the paper:

@misc{low2025ovitwinbackbonecrossmodal,
      title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation}, 
      author={Chetwin Low and Weimin Wang and Calder Katyal},
      year={2025},
      eprint={2510.01284},
      archivePrefix={arXiv},
      primaryClass={cs.MM},
      url={https://arxiv.org/abs/2510.01284}, 
}