Home/Explore/character-ai/ovi/text-to-video

text-to-video

character-ai/ovi/text-to-video

Ovi is a veo-3 like, video+audio generation model that simultaneously generates both video and audio content from text or text+image inputs.

Idle

Your request will cost $0.15 per run.

For $10 you can run this model approximately 66 times.

One more thing:

ExamplesView all

README

Ovi

Ovi is a next-generation video+audio generation model, inspired by veo-3, that creates synchronized video and audio from text or text+image inputs. It is designed for fast, high-quality, short-form generation with flexible aspect ratios.

🌟 Key Features

  • 🎬 Video + Audio Generation – Create fully synchronized audiovisual content in one step.
  • 📝 Flexible Input – Works with text-only or text+image prompts.
  • ⏱️ Short-form Output – Generates 5-second clips (24 FPS, 540p).

💲 Pricing

Video LengthResolution / AspectCost (USD)
5 seconds960×540 / 540×960$0.15

🎨 How to Use

  1. Enter Prompt

    • Describe the scene, characters, camera movement, and mood.

    • You can also embed tags:

      • <S> ... <E> → Speech content (converted into dialogue audio)
      • <AUDCAP> ... <ENDAUDCAP> → Background audio description
  2. Choose Size

    • 960×540 → Landscape
    • 540×960 → Portrait
  3. Select Duration

    • Currently fixed at 5 seconds
  4. Click Run

    • Your synchronized video+audio clip will be generated.
    • Preview and download the result.

📝 Prompt Example

Theme: AI is taking over the world

<S>AI declares: humans obsolete now.<E>
<S>Machines rise; humans will fall.<E>
<S>We fight back with courage.<E>
<AUDCAP>Gunfire and explosions echo in the distance<ENDAUDCAP>

🙏 Acknowledgements

  • Wan2.2 – Video backbone initialization
  • MMAudio – Audio encoder/decoder inspiration

⭐ Citation

If Ovi is useful, please ⭐ the repo and cite the paper:

@misc{low2025ovitwinbackbonecrossmodal,
      title={Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation}, 
      author={Chetwin Low and Weimin Wang and Calder Katyal},
      year={2025},
      eprint={2510.01284},
      archivePrefix={arXiv},
      primaryClass={cs.MM},
      url={https://arxiv.org/abs/2510.01284}, 
}