Recently Added | WaveSpeedAI

Nano Banana Pro | Nano Banana 2Mar.13 - 26 (UTC+8) 25% off

image-to-image

kwaivgi/kling-image-o3

Kling Image O3 is Kuaishou's latest image generation model based on the Kling 3.0 series. Supports text-to-image and multi-reference image generation with up to 10 reference images, element control with up to 3 elements, multiple resolutions (1K/2K/4K), and flexible aspect ratios. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-audio

mirelo-ai/sfx-v1/video-to-audio

Mirelo SFX V1 Video-to-Audio generates synchronized sound effects from video input with text prompt guidance. Supports multiple sample generation and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

wavespeed-ai/video-fps-increaser

AI Video FPS Increaser doubles your video frame rate for smoother motion and better playback quality. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-image

wavespeed-ai/ai-photo-colorizer

AI Photo Colorizer automatically adds color to black-and-white photos. Upload a grayscale image and get a colorized result. Ready-to-use REST inference API, no coldstarts, affordable pricing.

portrait-transfer

wavespeed-ai/video-body-swap

Video Body Swap replaces the body in a target video with your face. Upload a face image and a body video to get a seamless swap. Ready-to-use REST inference API, no coldstarts, affordable pricing.

portrait-transfer

wavespeed-ai/image-body-swap

Image Body Swap replaces the body in a target image with your face. Upload a face image and a body image to get a seamless swap. Ready-to-use REST inference API, no coldstarts, affordable pricing.

audio-to-audio

wavespeed-ai/ai-vocal-remover

AI Vocal Remover separates vocals from instrumental in any audio track. Upload an audio file and choose to extract vocals or instrumental. Ready-to-use REST inference API, no coldstarts, affordable pricing.

video-to-video

wavespeed-ai/rife/video-interpolation

RIFE Video Interpolation generates smooth intermediate frames between existing video frames for higher frame rates and smoother motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-image

wavespeed-ai/ai-gender-swap

AI Gender Swap transforms a portrait to show how you would look as the opposite gender. Upload a face photo and get an instant result. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-video

wavespeed-ai/ai-ghibli-filter-video

AI Ghibli Filter Video transforms a photo into a Studio Ghibli anime style video with customizable duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/ai-ghibli-filter

AI Ghibli Filter transforms a photo into Studio Ghibli anime style. Upload an image and get a Ghibli-style result. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/ai-age-filter

AI Age Filter transforms a portrait to show how you would look at different ages. Upload a face photo and select a target age. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/ai-dog-selfie

AI Dog Selfie generates cute dog selfie images with customizable breed, style, expression, and more. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-video

wavespeed-ai/ai-dog-selfie-video

AI Dog Selfie Video generates cute dog selfie videos with customizable breed, style, expression, action, and duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.

video-extend

vidu/q2-pro/extend-video

Vidu Q2 Pro Extend Video seamlessly extends existing videos by 1-7 seconds with high-quality motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-extend

vidu/q2-turbo/extend-video

Vidu Q2 Turbo Extend Video seamlessly extends existing videos by 1-7 seconds with consistent motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

wavespeed-ai/ai-sketch-to-video

AI Sketch to Video converts a sketch image into an animated video with customizable duration (5-15s). Ready-to-use REST inference API, no coldstarts, affordable pricing.

video-to-text

openai/sora-2/characters

OpenAI Sora 2 Characters creates reusable character IDs from video references for consistent character appearance across Sora 2 generations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/infinitetalk-fast/video-to-video-multi

digital-human

wavespeed-ai/infinitetalk-fast/video-to-video-multi

InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/infinitetalk/video-to-video-multi

digital-human

wavespeed-ai/infinitetalk/video-to-video-multi

InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-text

wavespeed-ai/ai-fortune-teller

AI Fortune Teller provides personalized fortune reading based on your birth info, with optional palm/face photo analysis. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-text

wavespeed-ai/ai-math-solver

AI Math Solver analyzes a math problem from an image and provides the solution. Upload a photo of a math problem and get step-by-step answers. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/ai-clothes-changer

AI Clothes Changer swaps clothing on a person using reference clothing images. Upload a portrait and up to 8 clothing images to try on. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/ai-celebrity-look-alike-finder

image-to-image

wavespeed-ai/ai-celebrity-look-alike-finder

AI Celebrity Look-Alike Finder analyzes a portrait and finds the closest celebrity match. Upload a face photo and discover which celebrity you resemble. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/ai-fat-filter

AI Fat Filter transforms a portrait image into a fun, exaggerated fat version. Upload a face photo and get an entertaining result. Ready-to-use REST inference API, no coldstarts, affordable pricing.

llm

wavespeed-ai/ai-story-generator

AI Story Generator creates stories from a theme or idea with customizable genre, length, perspective, audience, and format. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-video

openai/sora-2-pro/image-to-video

OpenAI Sora 2 Pro Image-to-Video creates physics-aware, realistic videos from reference images with synchronized audio and strong steerability. Supports 720p and 1080p resolutions with durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

openai/sora-2-pro/text-to-video

OpenAI Sora 2 Pro is a state-of-the-art text-to-video model with realistic physics, synchronized audio, and strong steerability. Supports multiple resolutions up to 1080p and durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-extend

wavespeed-ai/ltx-2.3/video-extend

LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/firered-image-v1.1/edit

FireRed Image Edit V1.1 enables precise image editing with natural-language instructions, supporting both English and Chinese prompts with multi-image references. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-video

wavespeed-ai/ugc-video-generator

WaveSpeed UGC Video Generator creates authentic, creator-style videos from text prompts and optional reference images with native audio, natural motion, and relatable aesthetics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-video

wavespeed-ai/short-video-generator

WaveSpeed Short Video Generator creates professional short-form videos from text prompts and optional reference images with native audio, smooth motion, and versatile aspect ratios. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-video

wavespeed-ai/tiktok-video-generator

WaveSpeed TikTok Video Generator creates viral-ready videos from text prompts and optional reference images with native audio, dynamic transitions, and scroll-stopping motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-video

wavespeed-ai/cinematic-video-generator

WaveSpeed Cinematic Video Generator creates Hollywood-quality Seedance 2.0 grade videos from text prompts and optional reference images with native audio, director-level camera control, and real-world physics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

digital-human

wavespeed-ai/ltx-2.3/lipsync

LTX-2.3 Lipsync generates talking head videos from audio with synchronized lip movements and natural facial expressions. Built on DiT-based architecture with improved audio-visual alignment quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

wavespeed-ai/ltx-2.3/image-to-video

lora-support

wavespeed-ai/ltx-2.3/image-to-video-lora

LTX-2.3 with LoRA support is a DiT-based audio-video foundation model designed to generate synchronized video and audio with custom styles, motion, or likeness training. Improved audio and visual quality with enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

lora-support

wavespeed-ai/ltx-2.3/text-to-video-lora

text-to-video

wavespeed-ai/ltx-2.3/text-to-video

text-to-image

google/nano-banana-2/text-to-image-fast

Google Nano Banana 2 Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 option, starting at just $0.045 per image. Delivers fast text-to-image generation with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

google/nano-banana-2/edit-fast

Google Nano Banana 2 Edit Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 editing option, starting at just $0.045 per image. Enables fast image editing with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

bria/embed-product

Bria Embed Product seamlessly integrates product images into scene backgrounds with natural lighting and perspective matching. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

motion-control

kwaivgi/kling-v3.0-std/motion-control

Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

motion-control

kwaivgi/kling-v3.0-pro/motion-control

video-extend

wavespeed-ai/ltx-2/video-extend

LTX Video 2.0 extends existing videos by generating new content at the start or end. Supports prompt-guided extension up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/qwen-image-2.0/edit

Qwen Image 2.0 Edit is an advanced image-editing model with improved quality and better understanding of instructions. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/qwen-image-2.0-pro/edit

Qwen Image 2.0 Pro Edit is a professional-grade image editing model with superior quality and advanced instruction understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

wavespeed-ai/qwen-image-2.0/text-to-image

Qwen Image 2.0 is an advanced text-to-image model with enhanced image quality and improved prompt understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

wavespeed-ai/qwen-image-2.0-pro/text-to-image

Qwen Image 2.0 Pro is a professional-grade text-to-image model with superior quality and advanced prompt understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/skyreels-v3/talking-avatar

SkyReels V3 Talking Avatar is a 19B-parameter multimodal model that generates talking avatars from portrait and audio with precise lip sync, supporting up to 20 seconds at 720p resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

wavespeed-ai/bitdance-14b/text-to-image

BitDance 14B is a 14B-parameter autoregressive text-to-image model using binary tokens for high-quality photorealistic image generation up to 1024px resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human

wavespeed-ai/soulx-flashhead

SoulX FlashHead enables real-time streaming talking head video generation from portrait image and audio with ultra-fast 96 FPS performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

wavespeed-ai/depth-anything/video

Depth Anything Video estimates depth maps from video input with temporal consistency. Supports multiple model sizes and colormaps. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-image

google/nano-banana-2/text-to-image

Google Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-quality image generation at Flash speed with 512px to 4K resolution support. Features include improved text rendering, character consistency for up to 5 characters, and real-world knowledge integration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

google/nano-banana-2/edit

Google Nano Banana 2 Edit (Gemini 3.1 Flash Image) enables advanced image editing with 4K-capable output, fast iteration, and precise instruction following. Supports text translation, localization within images, and maintains subject consistency during edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

kwaivgi/kling-elements

Kling Elements creates custom AI elements from reference images for video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

wavespeed-ai/cosmos-predict-2.5/text-to-video

Cosmos Predict 2.5 Text-to-Video generates video from text prompts using NVIDIA's 2B Cosmos Post-Trained Model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-video

wavespeed-ai/cosmos-predict-2.5/image-to-video

Cosmos Predict 2.5 Image-to-Video generates video from an image and text prompt using NVIDIA's 2B Cosmos Post-Trained Model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

video-extend

alibaba/wan-2.6/video-extend

Alibaba WAN 2.6 Video-Extend turns short clips into longer videos with preserved or generated synchronized audio for continuity. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

decart/lucy-image-to-video

Lucy Image-to-Video generates cinematic videos from a single image and text prompt. Lightning-fast inference with commercial-use license. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bytedance/seedream-v5.0-lite/edit-sequential

image-to-image

bytedance/seedream-v5.0-lite/edit-sequential

Seedream 5.0 Lite Edit Sequential performs multi-image editing while locking character and object identity across shots. It detects main subjects, preserves continuity, and applies controlled edits with up to 4K output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

bytedance/seedream-v5.0-lite/sequential

Seedream 5.0 Lite Sequential generates multi-image sets with consistent characters and objects, unifying palette, lighting, and style across all outputs. Supports up to 4K results for campaigns, storyboards, and product lines. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

vidu/q3/image-to-video-spicy

Vidu Q3 Image-to-Video Spicy generates unlimited high-quality videos from images with smooth animations and diverse motion, optimized for scalable content generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

bytedance/seedance-v1.5-pro/image-to-video-spicy

Seedance 1.5 Pro Spicy Image-to-Video generates unlimited high-quality cinematic clips from images, optimized for scalable content generation with smooth animations and stable aesthetics. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

bytedance/seedream-v5.0-lite

Seedream 5.0 Lite by ByteDance is a state-of-the-art text-to-image model with enhanced typography, clear text rendering for posters and brand visuals, superior prompt adherence, and up to 4K resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

bytedance/seedream-v5.0-lite/edit

Seedream 5.0 Lite Edit by ByteDance is a state-of-the-art image editing model preserving facial features, lighting, and color tones from reference images. Features high-fidelity editing with professional quality, superior prompt adherence, and up to 4K resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

alibaba/wan-2.6/image-to-video-spicy

Alibaba WAN 2.6 Spicy converts images into unlimited high-quality videos with smooth animations optimized for scalable content generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

video-effects

wavespeed-ai/ai-twerk

AI Twerk generates a fun twerking dance video from a single input image. Upload a photo and the model animates the person into an energetic twerking dance with upbeat hip-hop music. Ready-to-use REST inference API, no coldstarts, affordable pricing.

video-effects

wavespeed-ai/ai-kissing

AI Kissing generates a romantic kissing video from one or two input images. Upload one image with two people, or two separate images to composite them together. Ready-to-use REST inference API, no coldstarts, affordable pricing.

image-to-image

wavespeed-ai/firered-image/edit

FireRed Image Edit enables precise image editing with natural-language instructions, supporting both English and Chinese prompts with multi-image references. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-video

vidu/q3/image-to-video-pro

Vidu Q3 Image-to-Video Pro generates high-resolution videos (720p/1080p/2K/4K) from images with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

recraft-ai/recraft-v4-pro/text-to-vector

text-to-image

recraft-ai/recraft-v4-pro/text-to-vector

Recraft V4 Pro generates premium-quality SVG vector graphics from text prompts, designed for professional design and branding. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-image

recraft-ai/recraft-v4/text-to-vector

Recraft V4 generates native SVG vector graphics from text prompts, ideal for logos, icons, and design assets. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-image

recraft-ai/recraft-v4-pro/text-to-image

Recraft V4 Pro generates premium-quality images from text prompts, designed specifically for professional design and marketing use cases. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-image

recraft-ai/recraft-v4/text-to-image

Recraft V4 generates high-quality images from text prompts with color palette control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

image-to-video

alibaba/wan-2.6/image-to-video-pro

Alibaba WAN 2.6 Image-to-Video Pro converts images into premium-quality videos with superior motion dynamics, enhanced visual fidelity, and professional cinematic output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

upscaler

wavespeed-ai/ultimate-video-upscaler

Ultimate Video Upscaler converts low-resolution videos into crisp 4K footage with seamless motion dynamics and frame consistency. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

text-to-audio

google/gemini-2.5-flash/text-to-speech

Google Gemini 2.5 Flash Text-to-Speech delivers fast, natural multi-speaker voice synthesis with 30+ voices across 24 languages at lower cost. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

google/gemini-2.5-pro/text-to-speech

Google Gemini 2.5 Pro Text-to-Speech delivers natural multi-speaker voice synthesis with 30+ voices across 24 languages. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-3d

wavespeed-ai/hunyuan-3d-v3.1/image-to-3d-rapid

Hunyuan 3D V3.1 Rapid is a fast image-to-3D generation model, quickly converting 2D images into 3D models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-3d

wavespeed-ai/hunyuan-3d-v3.1/text-to-3d-rapid

Hunyuan 3D V3.1 Rapid is a fast text-to-3D generation model that quickly creates 3D models from text descriptions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

motion-control

bytedance/dreamactor-v2

ByteDance DreamActor V2 transfers motion from a driving video to characters in an image. Great performance for non-human and multiple characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

vidu/q3-turbo/start-end-to-video

Vidu Q3 Turbo Start-End-to-Video creates smooth transitions between two images with faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

vidu/q3-turbo/image-to-video

Vidu Q3 Turbo Image-to-Video animates static images with high-quality motion and faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-video

vidu/q3-turbo/text-to-video

Vidu Q3 Turbo Text-to-Video generates high-quality videos from text prompts with faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-video

vidu/q3/start-end-to-video

Vidu Q3 Start End Image-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

kwaivgi/kling-image-o3/edit

Kling O3 Edit is an AI image editing model with 4K resolution and multi-image reference support, enabling high-quality transformations with multiple reference inputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

kwaivgi/kling-image-v3/edit

Kling V3 Edit is an AI model for editing and transforming images via text prompts, enabling precise modifications with natural-language instructions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

kwaivgi/kling-image-o3/text-to-image

Kling O3 is Kuaishou's advanced AI image generation model with support for 4K resolution, delivering ultra-high-quality visuals with exceptional detail. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-image

kwaivgi/kling-image-v3/text-to-image

Kling V3.0 is Kuaishou's latest AI image generation model with superior text-to-image capabilities, delivering high-quality visuals with accurate prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

inworld/inworld-1.5-mini/text-to-speech

Inworld 1.5 Mini delivers high-quality text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

text-to-audio

inworld/inworld-1.5-max/text-to-speech

Inworld 1.5 Max delivers premium text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and high-fidelity natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

kwaivgi/kling-video-o3-std/video-edit

Kling Omni Video O3 Video-Edit (Standard) enables natural-language video edits: remove or replace objects, swap backgrounds, restyle scenes, change weather/lighting, and apply localized 3-10s transformations with strong temporal consistency. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

text-to-video

kwaivgi/kling-video-o3-std/text-to-video

Kling Omni Video O3 (Standard) is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-video-o3-std/reference-to-video

image-to-video

kwaivgi/kling-video-o3-std/reference-to-video

Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

image-to-video

kwaivgi/kling-video-o3-std/image-to-video

Kling Omni Video O3 (Standard) Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

video-to-video

kwaivgi/kling-video-o3-pro/video-edit

Kling Omni Video O3 Video-Edit enables conversational video editing through natural language commands. Remove objects, change backgrounds, modify styles, adjust weather/lighting, and transform scenes with simple text instructions like 'remove pedestrians' or 'change daytime to dusk'. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

text-to-video

kwaivgi/kling-video-o3-pro/text-to-video

Kling Omni Video O3 is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-video-o3-pro/reference-to-video

image-to-video

kwaivgi/kling-video-o3-pro/reference-to-video

Kling Omni Video O3 Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

image-to-video

kwaivgi/kling-video-o3-pro/image-to-video

Kling Omni Video O3 Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.