Nano Banana 2 & Pro Sale — 15% OFF | Apr 1–15 Only
Beranda/Jelajahi/character-ai/ovi/text-to-video
text-to-video

text-to-video

Character-AI Ovi

character-ai/ovi/text-to-video

Ovi is a veo-3-like model that converts text or text+image prompts into synchronized video with audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Input

Idle

Permintaan Anda akan membutuhkan $0.15 per run.

Untuk $10 Anda dapat menjalankan model ini sekitar 66 kali.

Satu lagi hal:

ContohLihat semua

README

Ovi

Ovi is a next-generation video+audio generation model, inspired by veo-3, that creates synchronized video and audio from text or text+image inputs. It is designed for fast, high-quality, short-form generation with flexible aspect ratios.

🌟 Key Features

  • 🎬 Video + Audio Generation – Create fully synchronized audiovisual content in one step.
  • 📝 Flexible Input – Works with text-only or text+image prompts.
  • ⏱️ Short-form Output – Generates 5-second clips (24 FPS, 540p).

💲 Pricing

Video LengthResolution / AspectCost (USD)
5 seconds960×540 / 540×960$0.15

🎨 How to Use

  1. Enter Prompt
  • Describe the scene, characters, camera movement, and mood.

  • You can also embed tags:

  • <S>... <E> → Speech content (converted into dialogue audio)

  • <AUDCAP>... <ENDAUDCAP> → Background audio description

  1. Choose Size
  • 960×540 → Landscape
  • 540×960 → Portrait
  1. Select Duration
  • Currently fixed at 5 seconds
  1. Click Run
  • Your synchronized video+audio clip will be generated.
  • Preview and download the result.

📝 Prompt Example

Theme: AI is taking over the world

<S>AI declares: humans obsolete now.<E>
<S>Machines rise; humans will fall.<E>
<S>We fight back with courage.<E>
<AUDCAP>Gunfire and explosions echo in the distance<ENDAUDCAP>

🙏 Acknowledgements

  • Wan2.2 – Video backbone initialization
  • MMAudio – Audio encoder/decoder inspiration

⭐ Citation

If Ovi is useful, please ⭐ the repo and cite the paper:

@misc{low2025ovitwinbackbonecrossmodal,
 title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation}, 
 author={Chetwin Low and Weimin Wang and Calder Katyal},
 year={2025},
 eprint={2510.01284},
 archivePrefix={arXiv},
 primaryClass={cs.MM},
 url={https://arxiv.org/abs/2510.01284}, 
}