WaveSpeedAI × WAN: SpeedUp 2nd - In CharacterJoin
Home/Explore/Kling O1 Models/kwaivgi/kling-video-o1-std/text-to-video
text-to-video

text-to-video

Kling Omni Video O1 Standard Text-To-Video

kwaivgi/kling-video-o1-std/text-to-video

Kling Omni Video O1 (Standard) is Kuaishou's first unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

Idle

Your request will cost $0.42 per run.

For $10 you can run this model approximately 23 times.

One more thing::

ExamplesView all

README

Kling Omni Video O1 — Text-to-Video (Standard)

Kling Omni Video O1 is Kuaishou's unified multi-modal video generation model, optimized for stable production use and cost efficiency.
The Text-to-Video mode transforms natural language prompts into high-quality videos with coherent motion, accurate semantic understanding, and consistent visual output.

Why Kling Video O1 (Standard)

Unified Creative Engine

The model supports multiple video generation and editing workflows within a single system:

  • Text-to-video generation
  • Image-to-video transformation
  • Reference-based video creation
  • Video editing and modification
  • Shot extension and scene continuation

Multi-Modal Visual Language (MVL)

The model interprets instructions through MVL, enabling understanding of:

  • Natural language descriptions
  • Visual context and references
  • Subject identity and appearance
  • Scene structure and motion dynamics

Subject Consistency

Maintains stable characters, objects, and scene attributes across frames, ensuring reliable and repeatable results suitable for production workflows.

Core Features

  • Cinematic-quality video generation with natural motion
  • Stable temporal consistency across the entire sequence
  • Accurate semantic understanding of text prompts
  • Support for multiple resolutions and output durations
  • Standard optimization for balanced quality, speed, and cost

How to Use

  1. Write Your Prompt
    Describe the scene, action, camera movement, and overall mood.

    Example: "A young woman walking through a neon-lit Tokyo street at night, rain reflecting city lights, cinematic tracking shot"

  2. Set Parameters
    Choose the desired duration, and aspect ratio.

  3. Generate
    Submit the request and receive a coherent video generated from text.

Pricing

durationprice
5s$0.42
10s$0.84

Billed based on the selected output duration. Pricing is optimized for standard production workloads.

Pro Tips

  • Use clear and descriptive prompts
  • Specify camera movement and framing for better motion quality
  • Include lighting, environment, and atmosphere details
  • Suitable for large-scale generation and cost-sensitive use cases

Kling O1 series models