

video-extend
vidu/q2-pro/extend-video
Vidu Q2 Pro Extend Video seamlessly extends existing videos by 1-7 seconds with high-quality motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


video-extend
vidu/q2-turbo/extend-video
Vidu Q2 Turbo Extend Video seamlessly extends existing videos by 1-7 seconds with consistent motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
wavespeed-ai/ai-sketch-to-video
AI Sketch to Video converts a sketch image into an animated video with customizable duration (5-15s). Ready-to-use REST inference API, no coldstarts, affordable pricing.


video-to-text
openai/sora-2/characters
OpenAI Sora 2 Characters creates reusable character IDs from video references for consistent character appearance across Sora 2 generations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


digital-human
wavespeed-ai/infinitetalk-fast/video-to-video-multi
InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


digital-human
wavespeed-ai/infinitetalk/video-to-video-multi
InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-text
wavespeed-ai/ai-fortune-teller
AI Fortune Teller provides personalized fortune reading based on your birth info, with optional palm/face photo analysis. Ready-to-use REST inference API, no coldstarts, affordable pricing.


image-to-text
wavespeed-ai/ai-math-solver
AI Math Solver analyzes a math problem from an image and provides the solution. Upload a photo of a math problem and get step-by-step answers. Ready-to-use REST inference API, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/ai-clothes-changer
AI Clothes Changer swaps clothing on a person using reference clothing images. Upload a portrait and up to 8 clothing images to try on. Ready-to-use REST inference API, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/ai-celebrity-look-alike-finder
AI Celebrity Look-Alike Finder analyzes a portrait and finds the closest celebrity match. Upload a face photo and discover which celebrity you resemble. Ready-to-use REST inference API, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/ai-fat-filter
AI Fat Filter transforms a portrait image into a fun, exaggerated fat version. Upload a face photo and get an entertaining result. Ready-to-use REST inference API, no coldstarts, affordable pricing.


llm
wavespeed-ai/ai-story-generator
AI Story Generator creates stories from a theme or idea with customizable genre, length, perspective, audience, and format. Ready-to-use REST inference API, no coldstarts, affordable pricing.


image-to-video
openai/sora-2-pro/image-to-video
OpenAI Sora 2 Pro Image-to-Video creates physics-aware, realistic videos from reference images with synchronized audio and strong steerability. Supports 720p and 1080p resolutions with durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
openai/sora-2-pro/text-to-video
OpenAI Sora 2 Pro is a state-of-the-art text-to-video model with realistic physics, synchronized audio, and strong steerability. Supports multiple resolutions up to 1080p and durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


video-extend
wavespeed-ai/ltx-2.3/video-extend
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/firered-image-v1.1/edit
FireRed Image Edit V1.1 enables precise image editing with natural-language instructions, supporting both English and Chinese prompts with multi-image references. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-video
wavespeed-ai/ugc-video-generator
WaveSpeed UGC Video Generator creates authentic, creator-style videos from text prompts and optional reference images with native audio, natural motion, and relatable aesthetics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-video
wavespeed-ai/short-video-generator
WaveSpeed Short Video Generator creates professional short-form videos from text prompts and optional reference images with native audio, smooth motion, and versatile aspect ratios. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-video
wavespeed-ai/tiktok-video-generator
WaveSpeed TikTok Video Generator creates viral-ready videos from text prompts and optional reference images with native audio, dynamic transitions, and scroll-stopping motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-video
wavespeed-ai/cinematic-video-generator
WaveSpeed Cinematic Video Generator creates Hollywood-quality Seedance 2.0 grade videos from text prompts and optional reference images with native audio, director-level camera control, and real-world physics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


digital-human
wavespeed-ai/ltx-2.3/lipsync
LTX-2.3 Lipsync generates talking head videos from audio with synchronized lip movements and natural facial expressions. Built on DiT-based architecture with improved audio-visual alignment quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
wavespeed-ai/ltx-2.3/image-to-video
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


lora-support
wavespeed-ai/ltx-2.3/image-to-video-lora
LTX-2.3 with LoRA support is a DiT-based audio-video foundation model designed to generate synchronized video and audio with custom styles, motion, or likeness training. Improved audio and visual quality with enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


lora-support
wavespeed-ai/ltx-2.3/text-to-video-lora
LTX-2.3 with LoRA support is a DiT-based audio-video foundation model designed to generate synchronized video and audio with custom styles, motion, or likeness training. Improved audio and visual quality with enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
wavespeed-ai/ltx-2.3/text-to-video
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
google/nano-banana-2/text-to-image-fast
Google Nano Banana 2 Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 option, starting at just $0.045 per image. Delivers fast text-to-image generation with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
google/nano-banana-2/edit-fast
Google Nano Banana 2 Edit Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 editing option, starting at just $0.045 per image. Enables fast image editing with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
bria/embed-product
Bria Embed Product seamlessly integrates product images into scene backgrounds with natural lighting and perspective matching. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


motion-control
kwaivgi/kling-v3.0-std/motion-control
Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


motion-control
kwaivgi/kling-v3.0-pro/motion-control
Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


video-extend
wavespeed-ai/ltx-2/video-extend
LTX Video 2.0 extends existing videos by generating new content at the start or end. Supports prompt-guided extension up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/qwen-image-2.0/edit
Qwen Image 2.0 Edit is an advanced image-editing model with improved quality and better understanding of instructions. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/qwen-image-2.0-pro/edit
Qwen Image 2.0 Pro Edit is a professional-grade image editing model with superior quality and advanced instruction understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/qwen-image-2.0/text-to-image
Qwen Image 2.0 is an advanced text-to-image model with enhanced image quality and improved prompt understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/qwen-image-2.0-pro/text-to-image
Qwen Image 2.0 Pro is a professional-grade text-to-image model with superior quality and advanced prompt understanding. Up to 2k. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

digital-human
wavespeed-ai/skyreels-v3/talking-avatar
SkyReels V3 Talking Avatar is a 19B-parameter multimodal model that generates talking avatars from portrait and audio with precise lip sync, supporting up to 20 seconds at 720p resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/bitdance-14b/text-to-image
BitDance 14B is a 14B-parameter autoregressive text-to-image model using binary tokens for high-quality photorealistic image generation up to 1024px resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


digital-human
wavespeed-ai/soulx-flashhead
SoulX FlashHead enables real-time streaming talking head video generation from portrait image and audio with ultra-fast 96 FPS performance. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


video-to-video
wavespeed-ai/depth-anything/video
Depth Anything Video estimates depth maps from video input with temporal consistency. Supports multiple model sizes and colormaps. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
google/nano-banana-2/text-to-image
Google Nano Banana 2 (Gemini 3.1 Flash Image) delivers Pro-quality image generation at Flash speed with 512px to 4K resolution support. Features include improved text rendering, character consistency for up to 5 characters, and real-world knowledge integration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
google/nano-banana-2/edit
Google Nano Banana 2 Edit (Gemini 3.1 Flash Image) enables advanced image editing with 4K-capable output, fast iteration, and precise instruction following. Supports text translation, localization within images, and maintains subject consistency during edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
kwaivgi/kling-elements
Kling Elements creates custom AI elements from reference images for video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
wavespeed-ai/cosmos-predict-2.5/text-to-video
Cosmos Predict 2.5 Text-to-Video generates video from text prompts using NVIDIA's 2B Cosmos Post-Trained Model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
wavespeed-ai/cosmos-predict-2.5/image-to-video
Cosmos Predict 2.5 Image-to-Video generates video from an image and text prompt using NVIDIA's 2B Cosmos Post-Trained Model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


video-extend
alibaba/wan-2.6/video-extend
Alibaba WAN 2.6 Video-Extend turns short clips into longer videos with preserved or generated synchronized audio for continuity. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
decart/lucy-image-to-video
Lucy Image-to-Video generates cinematic videos from a single image and text prompt. Lightning-fast inference with commercial-use license. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
bytedance/seedream-v5.0-lite/edit-sequential
Seedream 5.0 Lite Edit Sequential performs multi-image editing while locking character and object identity across shots. It detects main subjects, preserves continuity, and applies controlled edits with up to 4K output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
bytedance/seedream-v5.0-lite/sequential
Seedream 5.0 Lite Sequential generates multi-image sets with consistent characters and objects, unifying palette, lighting, and style across all outputs. Supports up to 4K results for campaigns, storyboards, and product lines. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
vidu/q3/image-to-video-spicy
Vidu Q3 Image-to-Video Spicy generates unlimited high-quality videos from images with smooth animations and diverse motion, optimized for scalable content generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
bytedance/seedance-v1.5-pro/image-to-video-spicy
Seedance 1.5 Pro Spicy Image-to-Video generates unlimited high-quality cinematic clips from images, optimized for scalable content generation with smooth animations and stable aesthetics. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
bytedance/seedream-v5.0-lite
Seedream 5.0 Lite by ByteDance is a state-of-the-art text-to-image model with enhanced typography, clear text rendering for posters and brand visuals, superior prompt adherence, and up to 4K resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
bytedance/seedream-v5.0-lite/edit
Seedream 5.0 Lite Edit by ByteDance is a state-of-the-art image editing model preserving facial features, lighting, and color tones from reference images. Features high-fidelity editing with professional quality, superior prompt adherence, and up to 4K resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
alibaba/wan-2.6/image-to-video-spicy
Alibaba WAN 2.6 Spicy converts images into unlimited high-quality videos with smooth animations optimized for scalable content generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


video-effects
wavespeed-ai/ai-twerk
AI Twerk generates a fun twerking dance video from a single input image. Upload a photo and the model animates the person into an energetic twerking dance with upbeat hip-hop music. Ready-to-use REST inference API, no coldstarts, affordable pricing.


video-effects
wavespeed-ai/ai-kissing
AI Kissing generates a romantic kissing video from one or two input images. Upload one image with two people, or two separate images to composite them together. Ready-to-use REST inference API, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/firered-image/edit
FireRed Image Edit enables precise image editing with natural-language instructions, supporting both English and Chinese prompts with multi-image references. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
vidu/q3/image-to-video-pro
Vidu Q3 Image-to-Video Pro generates high-resolution videos (720p/1080p/2K/4K) from images with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
recraft-ai/recraft-v4-pro/text-to-vector
Recraft V4 Pro generates premium-quality SVG vector graphics from text prompts, designed for professional design and branding. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
recraft-ai/recraft-v4/text-to-vector
Recraft V4 generates native SVG vector graphics from text prompts, ideal for logos, icons, and design assets. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
recraft-ai/recraft-v4-pro/text-to-image
Recraft V4 Pro generates premium-quality images from text prompts, designed specifically for professional design and marketing use cases. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
recraft-ai/recraft-v4/text-to-image
Recraft V4 generates high-quality images from text prompts with color palette control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
alibaba/wan-2.6/image-to-video-pro
Alibaba WAN 2.6 Image-to-Video Pro converts images into premium-quality videos with superior motion dynamics, enhanced visual fidelity, and professional cinematic output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


upscaler
wavespeed-ai/ultimate-video-upscaler
Ultimate Video Upscaler converts low-resolution videos into crisp 4K footage with seamless motion dynamics and frame consistency. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-audio
google/gemini-2.5-flash/text-to-speech
Google Gemini 2.5 Flash Text-to-Speech delivers fast, natural multi-speaker voice synthesis with 30+ voices across 24 languages at lower cost. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
google/gemini-2.5-pro/text-to-speech
Google Gemini 2.5 Pro Text-to-Speech delivers natural multi-speaker voice synthesis with 30+ voices across 24 languages. Perfect for dialogues, conversations, and multilingual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-3d
wavespeed-ai/hunyuan-3d-v3.1/image-to-3d-rapid
Hunyuan 3D V3.1 Rapid is a fast image-to-3D generation model, quickly converting 2D images into 3D models. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-3d
wavespeed-ai/hunyuan-3d-v3.1/text-to-3d-rapid
Hunyuan 3D V3.1 Rapid is a fast text-to-3D generation model that quickly creates 3D models from text descriptions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


motion-control
bytedance/dreamactor-v2
ByteDance DreamActor V2 transfers motion from a driving video to characters in an image. Great performance for non-human and multiple characters. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
vidu/q3-turbo/start-end-to-video
Vidu Q3 Turbo Start-End-to-Video creates smooth transitions between two images with faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
vidu/q3-turbo/image-to-video
Vidu Q3 Turbo Image-to-Video animates static images with high-quality motion and faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
vidu/q3-turbo/text-to-video
Vidu Q3 Turbo Text-to-Video generates high-quality videos from text prompts with faster processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
vidu/q3/start-end-to-video
Vidu Q3 Start End Image-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
kwaivgi/kling-image-o3/edit
Kling O3 Edit is an AI image editing model with 4K resolution and multi-image reference support, enabling high-quality transformations with multiple reference inputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
kwaivgi/kling-image-v3/edit
Kling V3 Edit is an AI model for editing and transforming images via text prompts, enabling precise modifications with natural-language instructions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
kwaivgi/kling-image-o3/text-to-image
Kling O3 is Kuaishou's advanced AI image generation model with support for 4K resolution, delivering ultra-high-quality visuals with exceptional detail. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
kwaivgi/kling-image-v3/text-to-image
Kling V3.0 is Kuaishou's latest AI image generation model with superior text-to-image capabilities, delivering high-quality visuals with accurate prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
inworld/inworld-1.5-mini/text-to-speech
Inworld 1.5 Mini delivers high-quality text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
inworld/inworld-1.5-max/text-to-speech
Inworld 1.5 Max delivers premium text-to-speech synthesis with 56+ multilingual voices, adjustable speaking rate, and high-fidelity natural-sounding audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


video-to-video
kwaivgi/kling-video-o3-std/video-edit
Kling Omni Video O3 Video-Edit (Standard) enables natural-language video edits: remove or replace objects, swap backgrounds, restyle scenes, change weather/lighting, and apply localized 3-10s transformations with strong temporal consistency. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.


text-to-video
kwaivgi/kling-video-o3-std/text-to-video
Kling Omni Video O3 (Standard) is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.


image-to-video
kwaivgi/kling-video-o3-std/reference-to-video
Kling Omni Video O3 (Standard) Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.


image-to-video
kwaivgi/kling-video-o3-std/image-to-video
Kling Omni Video O3 (Standard) Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.


video-to-video
kwaivgi/kling-video-o3-pro/video-edit
Kling Omni Video O3 Video-Edit enables conversational video editing through natural language commands. Remove objects, change backgrounds, modify styles, adjust weather/lighting, and transform scenes with simple text instructions like 'remove pedestrians' or 'change daytime to dusk'. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.


text-to-video
kwaivgi/kling-video-o3-pro/text-to-video
Kling Omni Video O3 is Kuaishou's advanced unified multi-modal video model with MVL (Multi-modal Visual Language) technology. Text-to-Video mode generates cinematic videos from text prompts with subject consistency, natural physics simulation, and precise semantic understanding. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.


image-to-video
kwaivgi/kling-video-o3-pro/reference-to-video
Kling Omni Video O3 Reference-to-Video generates creative videos using character, prop, or scene references from multiple viewpoints. Extracts subject features and creates new video content while maintaining identity consistency across frames. Supports audio generation. Ready-to-use REST API, best performance, no cold starts, affordable pricing.


image-to-video
kwaivgi/kling-video-o3-pro/image-to-video
Kling Omni Video O3 Image-to-Video transforms static images into dynamic cinematic videos using MVL (Multi-modal Visual Language) technology. Maintains subject consistency while adding natural motion, physics simulation, and seamless scene dynamics. Supports audio generation. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.


text-to-video
kwaivgi/kling-v3.0-std/text-to-video
Kling 3.0 Standard delivers high-quality text-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
kwaivgi/kling-v3.0-std/image-to-video
Kling 3.0 Standard delivers high-quality image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-video
kwaivgi/kling-v3.0-pro/text-to-video
Kling 3.0 Pro delivers top-tier text-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
kwaivgi/kling-v3.0-pro/image-to-video
Kling 3.0 Pro delivers top-tier image-to-video generation with smooth motion, cinematic visuals, accurate prompt adherence, and native audio for ready-to-share clips. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
wavespeed-ai/qwen-image/edit-2509-multiple-angles
Qwen Image Edit 2509 Multiple Angles is an AI image editing model that generates multiple-angle views of objects or scenes from a single image. Transform perspectives and create diverse viewpoints with text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
kwaivgi/kling-v1/ai-multi-shot
Kling V1 AI Multi-Shot delivers top-tier image-to-image generation with cinematic visuals, accurate prompt adherence, and multi-shot consistency for ready-to-share images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
wavespeed-ai/ace-step-1.5
ACE-Step 1.5 generates up to 4-minute music with lyrics from text. Supports 50+ languages, high acoustic fidelity, and runs efficiently on consumer hardware. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
microsoft/vibevoice
Microsoft VibeVoice text-to-speech model generates long-form speech from text with multi-speaker dialogue support. Choose from 9 voice presets across English, Chinese, and Hindi. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
elevenlabs/music
ElevenLabs Music generates original songs from text descriptions. Create instrumentals or full compositions with customizable duration. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
sourceful/riverflow-2.0-pro/edit
Sourceful Riverflow 2.0 Pro Edit is an agentic image model optimized for robust, high-precision image editing and transformation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
sourceful/riverflow-2.0-pro/text-to-image
Sourceful Riverflow 2.0 Pro is an agentic image model optimized for robust, high-precision text-to-image generations. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
alibaba/wan-2.6/reference-to-video-flash
Alibaba WAN 2.6 Reference-to-Video Flash turns character, prop, or scene references from images or videos into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Flash version with faster generation speed. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
kwaivgi/kling-v2.6-std/image-to-video
Kling 2.6 Standard offers cost-effective image-to-video generation with smooth motion, cinematic visuals, and accurate prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
kwaivgi/kling-v2.6-std/text-to-video
Kling 2.6 Standard offers cost-effective text-to-video generation with smooth motion, cinematic visuals, and strong prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.