WaveSpeedAI APICharacter AICharacter AI Ovi Image To Video

Character Ai Ovi Image To Video

Character Ai Ovi Image To Video

Playground

Try it on WavespeedAI!

Ovi is a veo-3 like, video+audio generation model that simultaneously generates both video and audio content from text or text+image inputs.

Features

Ovi (I2V Version)

Ovi is a veo-3 like, image-to-audio-video (I2AV) generation model that creates synchronized video and audio from a single image plus a descriptive text prompt.

It is designed for short-form storytelling, where a still image is brought to life with cinematic motion, dialogue, and sound.


🌟 Key Features

  • 🎬 Image → Video+Audio – Bring a static image to life with synchronized audiovisual output.
  • 📝 Prompt-driven – Use text prompts to control scene dynamics, style, and audio.
  • 🗣️ Speech & Sound – Insert dialogue or sound effects using special tags.
  • ⏱️ Short-form Output – Generates 5-second clips at 24 FPS.

💲 Pricing

Video LengthCost
5 seconds$0.15

Billing Rules

  • Minimum charge: 5 seconds

🎨 How to Use

  1. Upload Image

    • Provide a reference image as the base frame.
    • Make sure the URL is valid and accessible (a preview should appear).
  2. Enter Prompt

    • Describe scene motion, style, and atmosphere.

    • Use tags for sound:

      • <S> ... <E> → Speech (converted into spoken audio)
      • <AUDCAP> ... <ENDAUDCAP> → Background audio / effects
  3. Set Seed

    • -1 = random output
    • Any fixed number = reproducible results
  4. Run

    • Click Run $0.15 to generate your 5s image-to-audio-video clip.
    • Preview and download the result.

📝 Prompt Example

A wide shot of a medieval knight standing in the rain, sword planted into the ground, glowing with mystical energy.  
<S>I will defend this land until my last breath.<E>  
<AUDCAP>Thunder rolls across the dark sky, distant war drums echo.<ENDAUDCAP>

🙏 Acknowledgements

  • Wan2.2 – Video backbone initialization
  • MMAudio – Audio encoder/decoder inspiration

⭐ Citation

If Ovi is useful, please ⭐ the repo and cite the paper:

@misc&#123;low2025ovitwinbackbonecrossmodal,
      title=&#123;Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation&#125;, 
      author=&#123;Chetwin Low and Weimin Wang and Calder Katyal&#125;,
      year=&#123;2025&#125;,
      eprint=&#123;2510.01284&#125;,
      archivePrefix=&#123;arXiv&#125;,
      primaryClass=&#123;cs.MM&#125;,
      url=&#123;https://arxiv.org/abs/2510.01284&#125;, 
&#125;

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/character-ai/ovi/image-to-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "seed": -1
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
imagestringYes-The image for generating the output.
promptstringYes-The prompt for generating the output.
seedintegerNo-1-1 ~ 2147483647The random seed to use for the generation. -1 means a random seed will be used.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

© 2025 WaveSpeedAI. All rights reserved.