Nano Banana Pro | Nano Banana 2Mar.13 - 26 (UTC+8) 25% off
vidu/q2-pro/extend-video
video-extend

video-extend

vidu/q2-pro/extend-video

Vidu Q2 Pro Extend Video seamlessly extends existing videos by 1-7 seconds with high-quality motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

vidu/q2-turbo/extend-video
video-extend

video-extend

vidu/q2-turbo/extend-video

Vidu Q2 Turbo Extend Video seamlessly extends existing videos by 1-7 seconds with consistent motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ai-sketch-to-video
image-to-video

image-to-video

wavespeed-ai/ai-sketch-to-video

AI Sketch to Video converts a sketch image into an animated video with customizable duration (5-15s). Ready-to-use REST inference API, no coldstarts, affordable pricing.

openai/sora-2/characters
video-to-text

video-to-text

openai/sora-2/characters

OpenAI Sora 2 Characters creates reusable character IDs from video references for consistent character appearance across Sora 2 generations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/infinitetalk-fast/video-to-video-multi
digital-human

digital-human

wavespeed-ai/infinitetalk-fast/video-to-video-multi

InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/infinitetalk/video-to-video-multi
digital-human

digital-human

wavespeed-ai/infinitetalk/video-to-video-multi

InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ai-fortune-teller
image-to-text

image-to-text

wavespeed-ai/ai-fortune-teller

AI Fortune Teller provides personalized fortune reading based on your birth info, with optional palm/face photo analysis. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/ai-math-solver
image-to-text

image-to-text

wavespeed-ai/ai-math-solver

AI Math Solver analyzes a math problem from an image and provides the solution. Upload a photo of a math problem and get step-by-step answers. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/ai-clothes-changer
image-to-image

image-to-image

wavespeed-ai/ai-clothes-changer

AI Clothes Changer swaps clothing on a person using reference clothing images. Upload a portrait and up to 8 clothing images to try on. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/ai-celebrity-look-alike-finder
image-to-image

image-to-image

wavespeed-ai/ai-celebrity-look-alike-finder

AI Celebrity Look-Alike Finder analyzes a portrait and finds the closest celebrity match. Upload a face photo and discover which celebrity you resemble. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/ai-fat-filter
image-to-image

image-to-image

wavespeed-ai/ai-fat-filter

AI Fat Filter transforms a portrait image into a fun, exaggerated fat version. Upload a face photo and get an entertaining result. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/ai-story-generator
llm

llm

wavespeed-ai/ai-story-generator

AI Story Generator creates stories from a theme or idea with customizable genre, length, perspective, audience, and format. Ready-to-use REST inference API, no coldstarts, affordable pricing.

openai/sora-2-pro/image-to-video
image-to-video

image-to-video

openai/sora-2-pro/image-to-video

OpenAI Sora 2 Pro Image-to-Video creates physics-aware, realistic videos from reference images with synchronized audio and strong steerability. Supports 720p and 1080p resolutions with durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

openai/sora-2-pro/text-to-video
text-to-video

text-to-video

openai/sora-2-pro/text-to-video

OpenAI Sora 2 Pro is a state-of-the-art text-to-video model with realistic physics, synchronized audio, and strong steerability. Supports multiple resolutions up to 1080p and durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2.3/video-extend
video-extend

video-extend

wavespeed-ai/ltx-2.3/video-extend

LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/firered-image-v1.1/edit
image-to-image

image-to-image

wavespeed-ai/firered-image-v1.1/edit

FireRed Image Edit V1.1 enables precise image editing with natural-language instructions, supporting both English and Chinese prompts with multi-image references. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ugc-video-generator
text-to-video

text-to-video

wavespeed-ai/ugc-video-generator

WaveSpeed UGC Video Generator creates authentic, creator-style videos from text prompts and optional reference images with native audio, natural motion, and relatable aesthetics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/short-video-generator
text-to-video

text-to-video

wavespeed-ai/short-video-generator

WaveSpeed Short Video Generator creates professional short-form videos from text prompts and optional reference images with native audio, smooth motion, and versatile aspect ratios. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/tiktok-video-generator
text-to-video

text-to-video

wavespeed-ai/tiktok-video-generator

WaveSpeed TikTok Video Generator creates viral-ready videos from text prompts and optional reference images with native audio, dynamic transitions, and scroll-stopping motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/cinematic-video-generator
text-to-video

text-to-video

wavespeed-ai/cinematic-video-generator

WaveSpeed Cinematic Video Generator creates Hollywood-quality Seedance 2.0 grade videos from text prompts and optional reference images with native audio, director-level camera control, and real-world physics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ltx-2.3/lipsync
digital-human

digital-human

wavespeed-ai/ltx-2.3/lipsync

LTX-2.3 Lipsync generates talking head videos from audio with synchronized lip movements and natural facial expressions. Built on DiT-based architecture with improved audio-visual alignment quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2.3/image-to-video
image-to-video

image-to-video

wavespeed-ai/ltx-2.3/image-to-video

LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2.3/image-to-video-lora
lora-support

lora-support

wavespeed-ai/ltx-2.3/image-to-video-lora

LTX-2.3 with LoRA support is a DiT-based audio-video foundation model designed to generate synchronized video and audio with custom styles, motion, or likeness training. Improved audio and visual quality with enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2.3/text-to-video-lora
lora-support

lora-support

wavespeed-ai/ltx-2.3/text-to-video-lora

LTX-2.3 with LoRA support is a DiT-based audio-video foundation model designed to generate synchronized video and audio with custom styles, motion, or likeness training. Improved audio and visual quality with enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2.3/text-to-video
text-to-video

text-to-video

wavespeed-ai/ltx-2.3/text-to-video

LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

google/nano-banana-2/text-to-image-fast
text-to-image

text-to-image

google/nano-banana-2/text-to-image-fast

Google Nano Banana 2 Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 option, starting at just $0.045 per image. Delivers fast text-to-image generation with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

google/nano-banana-2/edit-fast
image-to-image

image-to-image

google/nano-banana-2/edit-fast

Google Nano Banana 2 Edit Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 editing option, starting at just $0.045 per image. Enables fast image editing with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bria/embed-product
image-to-image

image-to-image

bria/embed-product

Bria Embed Product seamlessly integrates product images into scene backgrounds with natural lighting and perspective matching. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-v3.0-std/motion-control
motion-control

motion-control

kwaivgi/kling-v3.0-std/motion-control

Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

kwaivgi/kling-v3.0-pro/motion-control
motion-control

motion-control

kwaivgi/kling-v3.0-pro/motion-control

Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ltx-2/video-extend
video-extend

video-extend

wavespeed-ai/ltx-2/video-extend

LTX Video 2.0 extends existing videos by generating new content at the start or end. Supports prompt-guided extension up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image-2.0/edit
image-to-image

image-to-image

wavespeed-ai/qwen-image-2.0/edit

Qwen Image 2.0 Edit is an advanced image-editing model with improved quality and better understanding of instructions. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image-2.0-pro/edit
image-to-image

image-to-image

wavespeed-ai/qwen-image-2.0-pro/edit

Qwen Image 2.0 Pro Edit is a professional-grade image editing model with superior quality and advanced instruction understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image-2.0/text-to-image
text-to-image

text-to-image

wavespeed-ai/qwen-image-2.0/text-to-image

Qwen Image 2.0 is an advanced text-to-image model with enhanced image quality and improved prompt understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image-2.0-pro/text-to-image
text-to-image

text-to-image

wavespeed-ai/qwen-image-2.0-pro/text-to-image

Qwen Image 2.0 Pro is a professional-grade text-to-image model with superior quality and advanced prompt understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/skyreels-v3/talking-avatar
digital-human

digital-human

wavespeed-ai/skyreels-v3/talking-avatar

SkyReels V3 Talking Avatar is a 19B-parameter multimodal model that generates talking avatars from portrait and audio with precise lip sync, supporting up to 20 seconds at 720p resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/bitdance-14b/text-to-image
text-to-image

text-to-image

wavespeed-ai/bitdance-14b/text-to-image

BitDance 14B is a 14B-parameter autoregressive text-to-image model using binary tokens for high-quality photorealistic image generation up to 1024px resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/soulx-flashhead
digital-human

digital-human

wavespeed-ai/soulx-flashhead

SoulX FlashHead enables real-time streaming talking head video generation from portrait image and audio with ultra-fast 96 FPS performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/depth-anything/video
video-to-video

video-to-video

wavespeed-ai/depth-anything/video

Depth Anything Video estimates depth maps from video input with temporal consistency. Supports multiple model sizes and colormaps. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

google/nano-banana-2/text-to-image
text-to-image

text-to-image

google/nano-banana-2/text-to-image

Google Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-quality image generation at Flash speed with 512px to 4K resolution support. Features include improved text rendering, character consistency for up to 5 characters, and real-world knowledge integration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

google/nano-banana-2/edit
image-to-image

image-to-image

google/nano-banana-2/edit

Google Nano Banana 2 Edit (Gemini 3.1 Flash Image) enables advanced image editing with 4K-capable output, fast iteration, and precise instruction following. Supports text translation, localization within images, and maintains subject consistency during edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-elements
image-to-video

image-to-video

kwaivgi/kling-elements

Kling Elements creates custom AI elements from reference images for video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/cosmos-predict-2.5/text-to-video
text-to-video

text-to-video

wavespeed-ai/cosmos-predict-2.5/text-to-video

Cosmos Predict 2.5 Text-to-Video generates video from text prompts using NVIDIA's 2B Cosmos Post-Trained Model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/cosmos-predict-2.5/image-to-video
image-to-video

image-to-video

wavespeed-ai/cosmos-predict-2.5/image-to-video

Cosmos Predict 2.5 Image-to-Video generates video from an image and text prompt using NVIDIA's 2B Cosmos Post-Trained Model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

alibaba/wan-2.6/video-extend
video-extend

video-extend

alibaba/wan-2.6/video-extend

Alibaba WAN 2.6 Video-Extend turns short clips into longer videos with preserved or generated synchronized audio for continuity. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

decart/lucy-image-to-video
image-to-video

image-to-video

decart/lucy-image-to-video

Lucy Image-to-Video generates cinematic videos from a single image and text prompt. Lightning-fast inference with commercial-use license. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bytedance/seedream-v5.0-lite/edit-sequential
image-to-image

image-to-image

bytedance/seedream-v5.0-lite/edit-sequential

Seedream 5.0 Lite Edit Sequential performs multi-image editing while locking character and object identity across shots. It detects main subjects, preserves continuity, and applies controlled edits with up to 4K output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bytedance/seedream-v5.0-lite/sequential
text-to-image

text-to-image

bytedance/seedream-v5.0-lite/sequential

Seedream 5.0 Lite Sequential generates multi-image sets with consistent characters and objects, unifying palette, lighting, and style across all outputs. Supports up to 4K results for campaigns, storyboards, and product lines. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

vidu/q3/image-to-video-spicy
image-to-video

image-to-video

vidu/q3/image-to-video-spicy

Vidu Q3 Image-to-Video Spicy generates unlimited high-quality videos from images with smooth animations and diverse motion, optimized for scalable content generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bytedance/seedance-v1.5-pro/image-to-video-spicy
image-to-video

image-to-video

bytedance/seedance-v1.5-pro/image-to-video-spicy

Seedance 1.5 Pro Spicy Image-to-Video generates unlimited high-quality cinematic clips from images, optimized for scalable content generation with smooth animations and stable aesthetics. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bytedance/seedream-v5.0-lite
text-to-image

text-to-image

bytedance/seedream-v5.0-lite

Seedream 5.0 Lite by ByteDance is a state-of-the-art text-to-image model with enhanced typography, clear text rendering for posters and brand visuals, superior prompt adherence, and up to 4K resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bytedance/seedream-v5.0-lite/edit
image-to-image

image-to-image

bytedance/seedream-v5.0-lite/edit

Seedream 5.0 Lite Edit by ByteDance is a state-of-the-art image editing model preserving facial features, lighting, and color tones from reference images. Features high-fidelity editing with professional quality, superior prompt adherence, and up to 4K resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

alibaba/wan-2.6/image-to-video-spicy
image-to-video

image-to-video

alibaba/wan-2.6/image-to-video-spicy

Alibaba WAN 2.6 Spicy converts images into unlimited high-quality videos with smooth animations optimized for scalable content generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ai-twerk
video-effects

video-effects

wavespeed-ai/ai-twerk

AI Twerk generates a fun twerking dance video from a single input image. Upload a photo and the model animates the person into an energetic twerking dance with upbeat hip-hop music. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/ai-kissing
video-effects

video-effects

wavespeed-ai/ai-kissing

AI Kissing generates a romantic kissing video from one or two input images. Upload one image with two people, or two separate images to composite them together. Ready-to-use REST inference API, no coldstarts, affordable pricing.

wavespeed-ai/firered-image/edit
image-to-image

image-to-image

wavespeed-ai/firered-image/edit

FireRed Image Edit enables precise image editing with natural-language instructions, supporting both English and Chinese prompts with multi-image references. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

vidu/q3/image-to-video-pro
image-to-video

image-to-video

vidu/q3/image-to-video-pro

Vidu Q3 Image-to-Video Pro generates high-resolution videos (720p/1080p/2K/4K) from images with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

recraft-ai/recraft-v4-pro/text-to-vector
text-to-image

text-to-image

recraft-ai/recraft-v4-pro/text-to-vector

Recraft V4 Pro generates premium-quality SVG vector graphics from text prompts, designed for professional design and branding. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

recraft-ai/recraft-v4/text-to-vector
text-to-image

text-to-image

recraft-ai/recraft-v4/text-to-vector

Recraft V4 generates native SVG vector graphics from text prompts, ideal for logos, icons, and design assets. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

recraft-ai/recraft-v4-pro/text-to-image
text-to-image

text-to-image

recraft-ai/recraft-v4-pro/text-to-image

Recraft V4 Pro generates premium-quality images from text prompts, designed specifically for professional design and marketing use cases. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

recraft-ai/recraft-v4/text-to-image
text-to-image

text-to-image

recraft-ai/recraft-v4/text-to-image

Recraft V4 generates high-quality images from text prompts with color palette control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

alibaba/wan-2.6/image-to-video-pro
image-to-video

image-to-video

alibaba/wan-2.6/image-to-video-pro

Alibaba WAN 2.6 Image-to-Video Pro converts images into premium-quality videos with superior motion dynamics, enhanced visual fidelity, and professional cinematic output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ultimate-video-upscaler
upscaler

upscaler

wavespeed-ai/ultimate-video-upscaler

Ultimate Video Upscaler converts low-resolution videos into crisp 4K footage with seamless motion dynamics and frame consistency. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

google/gemini-2.5-flash/text-to-speech
text-to-audio

text-to-audio

google/gemini-2.5-flash/text-to-speech

Google Gemini 2.5 Flash Text-to-Speech delivers fast, natural multi-speaker voice synthesis with 30+ voices across 24 languages at lower cost. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

google/gemini-2.5-pro/text-to-speech
text-to-audio

text-to-audio

google/gemini-2.5-pro/text-to-speech

Google Gemini 2.5 Pro Text-to-Speech delivers natural multi-speaker voice synthesis with 30+ voices across 24 languages. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/hunyuan-3d-v3.1/image-to-3d-rapid
image-to-3d

image-to-3d

wavespeed-ai/hunyuan-3d-v3.1/image-to-3d-rapid

Hunyuan 3D V3.1 Rapid is a fast image-to-3D generation model, quickly converting 2D images into 3D models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/hunyuan-3d-v3.1/text-to-3d-rapid
text-to-3d

text-to-3d

wavespeed-ai/hunyuan-3d-v3.1/text-to-3d-rapid

Hunyuan 3D V3.1 Rapid is a fast text-to-3D generation model that quickly creates 3D models from text descriptions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bytedance/dreamactor-v2
motion-control

motion-control

bytedance/dreamactor-v2

ByteDance DreamActor V2 transfers motion from a driving video to characters in an image. Great performance for non-human and multiple characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

vidu/q3-turbo/start-end-to-video
image-to-video

image-to-video

vidu/q3-turbo/start-end-to-video

Vidu Q3 Turbo Start-End-to-Video creates smooth transitions between two images with faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

vidu/q3-turbo/image-to-video
image-to-video

image-to-video

vidu/q3-turbo/image-to-video

Vidu Q3 Turbo Image-to-Video animates static images with high-quality motion and faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

vidu/q3-turbo/text-to-video
text-to-video

text-to-video

vidu/q3-turbo/text-to-video

Vidu Q3 Turbo Text-to-Video generates high-quality videos from text prompts with faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

vidu/q3/start-end-to-video
image-to-video

image-to-video

vidu/q3/start-end-to-video

Vidu Q3 Start End Image-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-image-o3/edit
image-to-image

image-to-image

kwaivgi/kling-image-o3/edit

Kling O3 Edit is an AI image editing model with 4K resolution and multi-image reference support, enabling high-quality transformations with multiple reference inputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-image-v3/edit
image-to-image

image-to-image

kwaivgi/kling-image-v3/edit

Kling V3 Edit is an AI model for editing and transforming images via text prompts, enabling precise modifications with natural-language instructions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-image-o3/text-to-image
text-to-image

text-to-image

kwaivgi/kling-image-o3/text-to-image

Kling O3 is Kuaishou's advanced AI image generation model with support for 4K resolution, delivering ultra-high-quality visuals with exceptional detail. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-image-v3/text-to-image
text-to-image

text-to-image

kwaivgi/kling-image-v3/text-to-image

Kling V3.0 is Kuaishou's latest AI image generation model with superior text-to-image capabilities, delivering high-quality visuals with accurate prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

inworld/inworld-1.5-mini/text-to-speech
text-to-audio

text-to-audio

inworld/inworld-1.5-mini/text-to-speech

Inworld 1.5 Mini delivers high-quality text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

inworld/inworld-1.5-max/text-to-speech
text-to-audio

text-to-audio

inworld/inworld-1.5-max/text-to-speech

Inworld 1.5 Max delivers premium text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and high-fidelity natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-video-o3-std/video-edit
video-to-video

video-to-video

kwaivgi/kling-video-o3-std/video-edit

Kling Omni Video O3 Video-Edit (Standard) enables natural-language video edits: remove or replace objects, swap backgrounds, restyle scenes, change weather/lighting, and apply localized 3-10s transformations with strong temporal consistency. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

kwaivgi/kling-video-o3-std/text-to-video
text-to-video

text-to-video

kwaivgi/kling-video-o3-std/text-to-video

Kling Omni Video O3 (Standard) is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-video-o3-std/reference-to-video
image-to-video

image-to-video

kwaivgi/kling-video-o3-std/reference-to-video

Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

kwaivgi/kling-video-o3-std/image-to-video
image-to-video

image-to-video

kwaivgi/kling-video-o3-std/image-to-video

Kling Omni Video O3 (Standard) Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-video-o3-pro/video-edit
video-to-video

video-to-video

kwaivgi/kling-video-o3-pro/video-edit

Kling Omni Video O3 Video-Edit enables conversational video editing through natural language commands. Remove objects, change backgrounds, modify styles, adjust weather/lighting, and transform scenes with simple text instructions like 'remove pedestrians' or 'change daytime to dusk'. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-video-o3-pro/text-to-video
text-to-video

text-to-video

kwaivgi/kling-video-o3-pro/text-to-video

Kling Omni Video O3 is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-video-o3-pro/reference-to-video
image-to-video

image-to-video

kwaivgi/kling-video-o3-pro/reference-to-video

Kling Omni Video O3 Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.

kwaivgi/kling-video-o3-pro/image-to-video
image-to-video

image-to-video

kwaivgi/kling-video-o3-pro/image-to-video

Kling Omni Video O3 Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-v3.0-std/text-to-video
text-to-video

text-to-video

kwaivgi/kling-v3.0-std/text-to-video

Kling 3.0 Standard delivers high-quality text-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

kwaivgi/kling-v3.0-std/image-to-video
image-to-video

image-to-video

kwaivgi/kling-v3.0-std/image-to-video

Kling 3.0 Standard delivers high-quality image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

kwaivgi/kling-v3.0-pro/text-to-video
text-to-video

text-to-video

kwaivgi/kling-v3.0-pro/text-to-video

Kling 3.0 Pro delivers top-tier text-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

kwaivgi/kling-v3.0-pro/image-to-video
image-to-video

image-to-video

kwaivgi/kling-v3.0-pro/image-to-video

Kling 3.0 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/qwen-image/edit-2509-multiple-angles
image-to-image

image-to-image

wavespeed-ai/qwen-image/edit-2509-multiple-angles

Qwen Image Edit 2509 Multiple Angles is an AI image editing model that generates multiple-angle views of objects or scenes from a single image. Transform perspectives and create diverse viewpoints with text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-v1/ai-multi-shot
image-to-image

image-to-image

kwaivgi/kling-v1/ai-multi-shot

Kling V1 AI Multi-Shot delivers top-tier image-to-image generation with cinematic visuals, accurate prompt adherence, and multi-shot consistency for ready-to-share images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ace-step-1.5
text-to-audio

text-to-audio

wavespeed-ai/ace-step-1.5

ACE-Step 1.5 generates up to 4-minute music with lyrics from text. Supports 50+ languages, high acoustic fidelity, and runs efficiently on consumer hardware. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

microsoft/vibevoice
text-to-audio

text-to-audio

microsoft/vibevoice

Microsoft VibeVoice text-to-speech model generates long-form speech from text with multi-speaker dialogue support. Choose from 9 voice presets across English, Chinese, and Hindi. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

elevenlabs/music
text-to-audio

text-to-audio

elevenlabs/music

ElevenLabs Music generates original songs from text descriptions. Create instrumentals or full compositions with customizable duration. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

sourceful/riverflow-2.0-pro/edit
image-to-image

image-to-image

sourceful/riverflow-2.0-pro/edit

Sourceful Riverflow 2.0 Pro Edit is an agentic image model optimized for robust, high-precision image editing and transformation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

sourceful/riverflow-2.0-pro/text-to-image
text-to-image

text-to-image

sourceful/riverflow-2.0-pro/text-to-image

Sourceful Riverflow 2.0 Pro is an agentic image model optimized for robust, high-precision text-to-image generations. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

alibaba/wan-2.6/reference-to-video-flash
image-to-video

image-to-video

alibaba/wan-2.6/reference-to-video-flash

Alibaba WAN 2.6 Reference-to-Video Flash turns character, prop, or scene references from images or videos into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Flash version with faster generation speed. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-v2.6-std/image-to-video
image-to-video

image-to-video

kwaivgi/kling-v2.6-std/image-to-video

Kling 2.6 Standard offers cost-effective image-to-video generation with smooth motion, cinematic visuals, and accurate prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

kwaivgi/kling-v2.6-std/text-to-video
text-to-video

text-to-video

kwaivgi/kling-v2.6-std/text-to-video

Kling 2.6 Standard offers cost-effective text-to-video generation with smooth motion, cinematic visuals, and strong prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.