

image-to-video
vidu/q2-pro/start-end-to-video-fast
Vidu Q2 Pro Fast Start-End to Video generates smooth video transitions between start and end images with faster generation speed. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
vidu/q2-pro/image-to-video-fast
Vidu Q2 Pro Fast Image to Video generates high-quality videos from a single image with faster generation speed. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-audio
wavespeed-ai/heartmula/generate-music
HeartMuLa is a state-of-the-art music generation model that creates high-quality songs from lyrics and style tags. Ready-to-use REST inference API with best performance, no coldstarts, and affordable pricing.


speech-to-text
wavespeed-ai/heartmula/transcribe-lyrics
HeartMuLa Transcribe extracts lyrics from audio files using advanced AI. Supports multilingual transcription. Ready-to-use REST inference API with best performance, no coldstarts, and affordable pricing.


image-to-video
vidu/q3/image-to-video
Vidu Q3 Image-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
vidu/q3/text-to-video
Vidu Q3 Text-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
minimax/music-2.5
MiniMax Music 2.5 is a full-dimensional breakthrough in AI music generation with high-fidelity audio, humanized vocals, and precise creative control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


upscaler
bria/fibo/video-upscaler
Bria Video Upscaler increases video resolution up to 8K with 2x or 4x upscaling. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


ai-remover
bria/fibo/video-background-remover
Bria Video Background Remover removes the background from videos with support for transparency and custom background colors. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
bria/fibo/reseason
Bria Reseason changes the season or weather atmosphere of an image. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
bria/fibo/image-blend
Bria Image Blend merges objects, applies textures, or rearranges items within an image using natural language. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
bria/fibo/restore
Bria Restore renews old photos by removing noise, scratches, and blur. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
bria/fibo/relight
Bria Relight modifies the lighting setup (direction and atmosphere) of an image. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
bria/fibo/colorize
Bria Colorize adds vivid colors to B&W photos or converts color to B&W with various style presets. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


video-to-video
x-ai/grok-imagine-video/edit-video
X-AI Grok Imagine Video Edit enables video editing using xAI's Grok Imagine Video model. Transform and modify existing videos with text prompts for seamless AI-powered edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
x-ai/grok-imagine-video/image-to-video
X-AI Grok Imagine Video transforms images into videos using xAI's Grok Imagine Video model. Animate still images with natural motion, scene continuity, and synchronized audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
x-ai/grok-imagine-video/text-to-video
X-AI Grok Imagine Video generates videos from text descriptions using xAI's Grok Imagine Video model. Create high-quality videos with customizable duration, aspect ratio, and resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
x-ai/grok-imagine-image/text-to-image
X-AI Grok Imagine Image enables precise image editing with xAI's Grok Imagine model. Transform and modify images using text prompts with AI-powered precision. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
x-ai/grok-imagine-image/edit
X-AI Grok Imagine Image enables precise image editing with xAI's Grok Imagine model. Transform and modify images using text prompts with AI-powered precision. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/hunyuan-image-3-instruct/edit
Hunyuan Image 3.0 Instruct Edit – instruction-based image editing with natural language prompts, supporting up to 2 reference images for precise modifications. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
wavespeed-ai/qwen-image-max/edit
Qwen Image Max Edit is an AI model for image editing with text prompts, supporting both Chinese and English languages. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/qwen-image-max/text-to-image
Qwen Image Max is a text-to-image model with high-quality image generation supporting Chinese and English prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/hunyuan-image-3-instruct/text-to-image
Hunyuan Image 3.0 Instruct text-to-image model from Tencent with high-quality image generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-video
vidu/one-click-v2/mv
Vidu One-Click V2 MV transforms images and audio into videos with camera movements and subtitle support. Create professional video content with dynamic shots and text overlays in one click. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


training
wavespeed-ai/z-image/base-lora-trainer
Z-Image Base LoRA Trainer – train custom image LoRA models from your own dataset, with zip uploads, auto-tuned defaults and fast iteration for brand, character or IP looks. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


digital-human
wavespeed-ai/ltx-2-19b/lipsync
LTX-2 Lipsync generates synchronized talking head videos from a reference image and audio input. Powered by the 19B DiT architecture, it produces high-fidelity lip-synced videos with natural head movements. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


lora-support
wavespeed-ai/z-image/base-lora
Z-Image-Base LoRA (6B) enables high-quality text-to-image generation with full CFG support and external LoRA support. Supports negative prompting while applying up to 3 LoRAs for custom styles. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/z-image/base
Z-Image-Base is a 6 billion-parameter text-to-image model with full CFG support. Supports negative prompting and fine-tuning capabilities for maximum control over image generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
wavespeed-ai/qwen3-tts/voice-design
Qwen3 TTS Voice Design: Generate speech with custom voice characteristics described in natural language. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


audio-to-audio
wavespeed-ai/qwen3-tts/voice-clone
Qwen3 TTS Voice Clone: Clone any voice from a reference audio and generate speech in that voice. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-audio
wavespeed-ai/qwen3-tts/text-to-speech
Qwen3 TTS: Multi-language, multi-voice text-to-speech synthesis with style control. Supports 11 languages and 9 voice characters. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-video
pixverse/pixverse-v5.6/text-to-video
PixVerse V5.6 transforms text prompts into realistic videos with smooth motion and natural detail in seconds—ideal for stories, ads, and social clips. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
pixverse/pixverse-v5.6/image-to-video
PixVerse V5.6 Image-to-Video turns a single image into cinematic clips with smooth motion, clean detail, and strong subject fidelity—ideal for logo stingers, character motion, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-audio
minimax/speech-2.8-hd
MiniMax Speech 2.8 HD is a high-definition text-to-speech model with natural and expressive voice synthesis for premium audio quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-audio
minimax/speech-2.8-turbo
MiniMax Speech 2.8 Turbo is a high-definition text-to-speech model with natural and expressive voice synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/ic-light
IC-Light V2 is an AI-powered image relighting model. Relight any image with customizable lighting direction. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-3d
wavespeed-ai/meshy6/text-to-3d
Meshy 6 generates high-quality 3D models from text descriptions with accurate geometry and superior texture quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-3d
wavespeed-ai/meshy6/image-to-3d
Meshy 6 converts 2D images into high-quality 3D models with accurate geometry reconstruction and superior texture quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-text
wavespeed-ai/sam3-image-rle
SAM 3 RLE is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Returns RLE (Run-Length Encoding) encoded masks for efficient storage and processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


video-to-text
wavespeed-ai/sam3-video-rle
SAM 3 Video RLE is a unified foundation model for prompt-based segmentation in video. Track and segment objects across frames using text, points, or boxes, returning RLE encoded masks for efficient processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/qwen-image/edit-multiple-angles
Generate specific camera angles from a single image using a 96-pose camera system. Control horizontal rotation, vertical tilt, and zoom to create front, side, back views and more. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
minimax/image-01/text-to-image
MiniMax Image-01 text-to-image model generates high-quality images from text descriptions. Create diverse visuals across multiple styles and scenarios with natural language prompts. Supports multiple aspect ratios and custom dimensions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
minimax/image-01/image-to-image
MiniMax Image-01 image-to-image model transforms existing images using text prompts. Generate variations, apply style transfers, or modify images with character references. Supports multiple aspect ratios and custom dimensions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
alibaba/wan-2.6/image-to-video-flash
Alibaba WAN 2.6 Flash converts images into videos (720p/1080p) with optional audio, optimized for speed and cost. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
topaz/image/sharpen
Topaz Image Sharpen brings clarity and crisp definition to soft or out-of-focus images. Perfect for fixing lens blur, motion blur, and general softness. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
topaz/image/restore
Topaz Image Restore enhances older and poorer quality photos through restoration. Remove dust, scratches, and damage from vintage photos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
topaz/image/lighting
Topaz Image Lighting adjusts and balances images to improve quality despite sub-optimal lighting. Fix exposure, white balance, and color temperature. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
topaz/image/denoise
Topaz Image Denoise removes grain and high-ISO noise while preserving detail. Perfect for low-light photography and high-ISO images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
bria/fibo/edit
FIBO is an open-source JSON-native image-to-image model that maps intent to structured controls for precise enterprise image generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


lora-support
wavespeed-ai/flux-2-klein-4b/edit-lora
FLUX.2 [klein] 4B Edit with LoRA support enables precise image-to-image editing with natural language instructions, multi-reference support, and LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


lora-support
wavespeed-ai/flux-2-klein-4b/text-to-image-lora
FLUX.2 [klein] 4B with LoRA support is a compact 4-billion-parameter text-to-image model that delivers fast generation with quality results and LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


lora-support
wavespeed-ai/flux-2-klein-9b/text-to-image-lora
FLUX.2 [klein] 9B with LoRA support is a high-quality text-to-image model with 9B parameters, offering enhanced realism, crisper text generation, and fast LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


lora-support
wavespeed-ai/flux-2-klein-9b/edit-lora
FLUX.2 [klein] 9B Edit with LoRA support is a high-quality image editing model with 9B parameters, offering precise modifications using natural language instructions and personalized styles via custom LoRA adapters. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
wavespeed-ai/flux-2-klein-4b/edit
FLUX.2 [klein] 4B Edit enables precise image-to-image editing with natural language instructions and multi-reference support. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
wavespeed-ai/flux-2-klein-9b/edit
FLUX.2 [klein] 9B Edit is a high-quality image editing model with 9B parameters, offering precise modifications using natural language instructions. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
wavespeed-ai/flux-2-klein-9b/text-to-image
FLUX.2 [klein] 9B is a high-quality text-to-image model with 9B parameters, offering enhanced realism and crisper text generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
wavespeed-ai/flux-2-klein-4b/text-to-image
FLUX.2 [klein] 4B is a compact 4-billion-parameter text-to-image model that delivers fast generation with quality results. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


video-dubbing
elevenlabs/dubbing
ElevenLabs Dubbing automatically translates and dubs video/audio content into different languages while preserving the original speakers' voices. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/sam3-image
SAM 3 is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/z-image-turbo/controlnet
Z-Image-Turbo ControlNet generates images guided by structural control signals (depth, canny edge, pose) for precise composition control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


audio-to-audio
elevenlabs/voice-changer
ElevenLabs Voice Changer transforms any audio into speech with a different voice while preserving the original speech patterns and timing. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


training
wavespeed-ai/ltx-2-19b/ic-lora-trainer
LTX-2 IC-LoRA Trainer lets you train custom In-Context LoRA models for video-to-video transformations, including depth/pose adapters, video restoration, and style transfer. Upload a ZIP file containing paired videos to start. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


training
wavespeed-ai/ltx-2-19b/video-lora-trainer
LTX-2 Audio-Video LoRA Trainer lets you train custom LoRA models with synchronized audio-video generation support. Train action, motion, and video effect models by uploading a ZIP file containing videos with optional audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-image
z-ai/glm-image/edit
GLM-Image Edit is a powerful image-to-image editing model that transforms images based on text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
z-ai/glm-image/text-to-image
Z-AI GLM Image generates high-quality images from text prompts, with enhanced understanding of user descriptions, resulting in images that are more precise and personal. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


upscaler
wavespeed-ai/ltx-2-19b/video-upscaler
LTX-2 19B Video Upscaler converts low-resolution videos into crisp 4K footage with seamless motion dynamics and frame consistency. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


motion-control
wavespeed-ai/ltx-2-19b/control
LTX-2 19B ControlNet generates synchronized audio-video (up to 20s) from video input with pose, depth, or canny edge guidance. Supports audio preservation, generation, or removal for flexible video transformation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


lora-support
wavespeed-ai/ltx-2-19b/image-to-video-lora
LTX-2 19b Image-to-Video LoRA is the first DiT-based audio-video foundation model with synchronized audio and video generation. This LoRA version supports custom style adapters for personalized video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


lora-support
wavespeed-ai/ltx-2-19b/text-to-video-lora
LTX-2 19b Text-to-Video LoRA is the first DiT-based audio-video foundation model with synchronized audio and video generation. This LoRA version supports custom style adapters for personalized video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-video
wavespeed-ai/ltx-2-19b/text-to-video
LTX-2 19b is the first DiT-based audio-video foundation model with synchronized audio and video, high fidelity, multiple performance modes, and production-ready outputs in one model. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


image-to-video
wavespeed-ai/ltx-2-19b/image-to-video
LTX-2 19b is the first DiT-based audio-video foundation model with synchronized audio and video, high fidelity, multiple performance modes, and production-ready outputs in one model. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


training
wavespeed-ai/qwen-image-2512-lora-trainer
Qwen-Image-2512 LoRA Trainer lets you train custom LoRA models 10x faster with style, character, and object training. From concept to model in minutes, not hours—upload a ZIP file containing images to start. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


lora-support
wavespeed-ai/z-image-turbo/image-to-image-lora
Z-Image-Turbo Image-to-Image LoRA transforms reference images with custom LoRA styles in sub-second time. Apply up to 3 LoRAs for personalized image transformation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


image-to-image
wavespeed-ai/z-image-turbo/image-to-image
Z-Image-Turbo Image-to-Image is a 6 billion parameter model that enhances the quality of reference images (similar to upscaling) in sub-second time. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


text-to-image
z-ai/cogview-4
Z-AI CogView-4 generates high-quality images from text prompts with a quick and accurate understanding of user descriptions, letting AI express images more precisely and personally. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


lora-support
wavespeed-ai/qwen-image/text-to-image-2512-lora
Qwen-Image-2512 LoRA is an enhanced 20B MMDiT text-to-image model with LoRA support for fast customization and refined image generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


video-to-video
luma/modify-video
Luma Modify Video transforms and restyles existing videos with AI-powered modifications, enabling style changes, visual enhancements, and creative edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.


text-to-image
wavespeed-ai/qwen-image/text-to-image-2512
Qwen Image 2512 is Alibaba Qwen's latest text-to-image model with enhanced prompt understanding, superior text rendering, and versatile aspect ratio support. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


content-moderation
wavespeed-ai/molmo2/video-content-moderator
Molmo2-4B Video Content Moderator analyzes video content for safety, appropriateness, and policy compliance. Detects violence, nudity, gore, and other harmful visual content in videos using an open-source vision-language model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


content-moderation
wavespeed-ai/molmo2/image-content-moderator
Molmo2-4B Image Content Moderator: Analyze image content for safety, appropriateness, and policy compliance. Detects violence, nudity, gore, and other harmful visual content. Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.


content-moderation
wavespeed-ai/molmo2/text-content-moderator
Molmo2-4B Text Content Moderator: Analyze text content for safety, appropriateness, and policy compliance. Detects hate speech, violence, sexual content, and other harmful categories. Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.


motion-control
kwaivgi/kling-v2.6-std/motion-control
Kling 2.6 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


llm
wavespeed-ai/molmo2/prompt-optimizer
Molmo2-4B Prompt Optimizer: Enhance prompts for image and video generation with intelligent restructuring, style guidance, and context-aware improvements. Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.


image-to-text
wavespeed-ai/molmo2/image-qa
Molmo2-4B Image QA: Answer questions about images with support for multi-image comparison (1-2 images). Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.


video-to-text
wavespeed-ai/molmo2/video-understanding
Molmo2-4B Video Understanding: Analyze videos with specialized tasks (general, summary, analysis, counting, scene description). Open-source vision-language model with temporal understanding. Ready-to-use REST API, no cold starts, duration-based pricing.


video-to-text
wavespeed-ai/molmo2/video-qa
Molmo2-4B Video QA: Answer questions about video content with temporal understanding. Open-source vision-language model. Ready-to-use REST API, no cold starts, duration-based pricing.


video-to-text
wavespeed-ai/molmo2/video-captioner
Molmo2-4B Video Captioner: Generate detailed, accurate captions for videos with customizable detail levels (low, medium, high). Open-source vision-language model with temporal understanding capabilities. Ready-to-use REST API, no cold starts, duration-based pricing.


image-to-text
wavespeed-ai/molmo2/image-captioner
Molmo2-4B Image Captioner: Generate detailed, accurate captions for images with customizable detail levels (low, medium, high). Open-source vision-language model with object grounding capabilities. Ready-to-use REST API, no cold starts, affordable pricing.


image-to-text
wavespeed-ai/paddle-ocr
PaddleOCR-VL is an ultra-compact 0.9B parameter vision-language model for document parsing, supporting 109 languages with text, table, formula, and chart recognition in JSON or Markdown output. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.


speech-to-text
wavespeed-ai/openai-whisper-with-video
OpenAI Whisper Large v3 (Video-to-Text) delivers high-accuracy multilingual transcription directly from video files, with automatic language detection and optional timestamped, subtitle-ready segments. Built for stable production use with a ready-to-use REST API, fast response, no cold starts, and predictable pricing.


ai-remover
wavespeed-ai/video-background-remover
WaveSpeed Video Background Remover replaces or removes video backgrounds with a custom image. Upload or paste a link to your video, then provide a background image by URL or file—clean matting, edge-aware blending, and natural compositing keep subjects realistic. Built for creator workflows and batch jobs. Ready-to-use REST inference API with fast response, no cold starts, and predictable pricing.


text-to-video
wavespeed-ai/kandinsky5-pro/text-to-video
Kandinsky 5 Pro Text-to-Video turns natural-language prompts into coherent 5-second clips with strong prompt adherence and smooth motion. Choose 512p or 1024p output across common aspect ratios for social posts, ads, and concept shots. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.


image-to-video
wavespeed-ai/kandinsky5-pro/image-to-video
Kandinsky 5 Pro Image-to-Video turns a single image into a coherent 5-second video guided by a natural-language prompt. It preserves subject and composition while adding smooth motion and cinematic dynamics. Output at 512p or 1024p in common aspect ratios for social posts, ads, and concept previews. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.


lora-support
wavespeed-ai/qwen-image/edit-2511-lora
Qwen Image Edit 2511 LoRA is an enhanced version with custom LoRA support for personalized styles. It delivers stronger edit consistency, robust multi-person identity/pose consistency, custom LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving edits. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.


digital-human
wavespeed-ai/longcat-avatar
LongCat Avatar produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 2 minutes, 720p tier $0.40/5s. Ready-to-use REST API, no coldstarts, affordable pricing.


image-to-image
wavespeed-ai/qwen-image/edit-2511
Qwen Image Edit 2511 is a major upgrade over 2509 for real-world image editing and design. It delivers stronger edit consistency, robust multi-person identity/pose consistency, built-in LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving edits. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.


video-extend
bytedance/seedance-v1.5-pro/video-extend-fast
Seedance 1.5 Pro Fast Video Extend turns short shots into longer clips with natural motion continuation and strong temporal consistency. Supports 4–12 s extensions, 720p/1080p output with built-in upscaling, and seed-reproducible results for shot matching. Ideal for ads, trailers, and short-drama beats. Production-ready REST API with fast response, no cold starts, and affordable pricing.


video-extend
bytedance/seedance-v1.5-pro/video-extend
Seedance 1.5 Pro Video-Extend turns short video clips into longer videos with natural motion continuation, stable aesthetics, and upscaled output. It supports 4–12s duration control, multiple aspect ratios/resolutions, and seed-reproducible results—ideal for extending ad creatives and short-drama shots. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.


text-to-video
bytedance/seedance-v1.5-pro/text-to-video-fast
Seedance 1.5 Pro Fast (Text-to-Video) converts text prompts into cinematic, live-action-leaning videos with strong prompt adherence, expressive yet stable motion, and consistent aesthetics. It supports 4–12s duration control, multiple aspect ratios (9:16, 1:1, 16:9), and 720p/1080p output with seed-reproducible results—ideal for ads, trailers, and short-drama beats. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.


image-to-video
bytedance/seedance-v1.5-pro/image-to-video-fast
Seedance 1.5 Pro Fast Image-to-Video transforms a single image (plus optional text prompt) into cinematic, live-action-leaning clips while preserving subject identity, composition, and first-frame fidelity. It supports 4–12s duration control, adaptive aspect ratios that follow the input image, expressive yet stable motion, and seed-based reproducibility—ideal for ad creatives and short-drama beats anchored by a strong visual. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.