Vidu Contest
WaveSpeed.ai
vidu/q2-pro/start-end-to-video-fast
image-to-video

image-to-video

vidu/q2-pro/start-end-to-video-fast

Vidu Q2 Pro Fast Start-End to Video generates smooth video transitions between start and end images with faster generation speed. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

vidu/q2-pro/image-to-video-fast
image-to-video

image-to-video

vidu/q2-pro/image-to-video-fast

Vidu Q2 Pro Fast Image to Video generates high-quality videos from a single image with faster generation speed. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/heartmula/generate-music
text-to-audio

text-to-audio

wavespeed-ai/heartmula/generate-music

HeartMuLa is a state-of-the-art music generation model that creates high-quality songs from lyrics and style tags. Ready-to-use REST inference API with best performance, no coldstarts, and affordable pricing.

wavespeed-ai/heartmula/transcribe-lyrics
speech-to-text

speech-to-text

wavespeed-ai/heartmula/transcribe-lyrics

HeartMuLa Transcribe extracts lyrics from audio files using advanced AI. Supports multilingual transcription. Ready-to-use REST inference API with best performance, no coldstarts, and affordable pricing.

vidu/q3/image-to-video
image-to-video

image-to-video

vidu/q3/image-to-video

Vidu Q3 Image-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

vidu/q3/text-to-video
text-to-video

text-to-video

vidu/q3/text-to-video

Vidu Q3 Text-to-Video turns text prompts into high-quality videos with exceptional visual fidelity and diverse motion. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

minimax/music-2.5
text-to-audio

text-to-audio

minimax/music-2.5

MiniMax Music 2.5 is a full-dimensional breakthrough in AI music generation with high-fidelity audio, humanized vocals, and precise creative control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bria/fibo/video-upscaler
upscaler

upscaler

bria/fibo/video-upscaler

Bria Video Upscaler increases video resolution up to 8K with 2x or 4x upscaling. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bria/fibo/video-background-remover
ai-remover

ai-remover

bria/fibo/video-background-remover

Bria Video Background Remover removes the background from videos with support for transparency and custom background colors. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bria/fibo/reseason
image-to-image

image-to-image

bria/fibo/reseason

Bria Reseason changes the season or weather atmosphere of an image. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bria/fibo/image-blend
image-to-image

image-to-image

bria/fibo/image-blend

Bria Image Blend merges objects, applies textures, or rearranges items within an image using natural language. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bria/fibo/restore
image-to-image

image-to-image

bria/fibo/restore

Bria Restore renews old photos by removing noise, scratches, and blur. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bria/fibo/relight
image-to-image

image-to-image

bria/fibo/relight

Bria Relight modifies the lighting setup (direction and atmosphere) of an image. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

bria/fibo/colorize
image-to-image

image-to-image

bria/fibo/colorize

Bria Colorize adds vivid colors to B&W photos or converts color to B&W with various style presets. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

x-ai/grok-imagine-video/edit-video
video-to-video

video-to-video

x-ai/grok-imagine-video/edit-video

X-AI Grok Imagine Video Edit enables video editing using xAI's Grok Imagine Video model. Transform and modify existing videos with text prompts for seamless AI-powered edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

x-ai/grok-imagine-video/image-to-video
image-to-video

image-to-video

x-ai/grok-imagine-video/image-to-video

X-AI Grok Imagine Video transforms images into videos using xAI's Grok Imagine Video model. Animate still images with natural motion, scene continuity, and synchronized audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

x-ai/grok-imagine-video/text-to-video
text-to-video

text-to-video

x-ai/grok-imagine-video/text-to-video

X-AI Grok Imagine Video generates videos from text descriptions using xAI's Grok Imagine Video model. Create high-quality videos with customizable duration, aspect ratio, and resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

x-ai/grok-imagine-image/text-to-image
text-to-image

text-to-image

x-ai/grok-imagine-image/text-to-image

X-AI Grok Imagine Image enables precise image editing with xAI's Grok Imagine model. Transform and modify images using text prompts with AI-powered precision. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

x-ai/grok-imagine-image/edit
image-to-image

image-to-image

x-ai/grok-imagine-image/edit

X-AI Grok Imagine Image enables precise image editing with xAI's Grok Imagine model. Transform and modify images using text prompts with AI-powered precision. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/hunyuan-image-3-instruct/edit
image-to-image

image-to-image

wavespeed-ai/hunyuan-image-3-instruct/edit

Hunyuan Image 3.0 Instruct Edit – instruction-based image editing with natural language prompts, supporting up to 2 reference images for precise modifications. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/qwen-image-max/edit
image-to-image

image-to-image

wavespeed-ai/qwen-image-max/edit

Qwen Image Max Edit is an AI model for image editing with text prompts, supporting both Chinese and English languages. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image-max/text-to-image
text-to-image

text-to-image

wavespeed-ai/qwen-image-max/text-to-image

Qwen Image Max is a text-to-image model with high-quality image generation supporting Chinese and English prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/hunyuan-image-3-instruct/text-to-image
text-to-image

text-to-image

wavespeed-ai/hunyuan-image-3-instruct/text-to-image

Hunyuan Image 3.0 Instruct text-to-image model from Tencent with high-quality image generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

vidu/one-click-v2/mv
image-to-video

image-to-video

vidu/one-click-v2/mv

Vidu One-Click V2 MV transforms images and audio into videos with camera movements and subtitle support. Create professional video content with dynamic shots and text overlays in one click. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/z-image/base-lora-trainer
training

training

wavespeed-ai/z-image/base-lora-trainer

Z-Image Base LoRA Trainer – train custom image LoRA models from your own dataset, with zip uploads, auto-tuned defaults and fast iteration for brand, character or IP looks. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ltx-2-19b/lipsync
digital-human

digital-human

wavespeed-ai/ltx-2-19b/lipsync

LTX-2 Lipsync generates synchronized talking head videos from a reference image and audio input. Powered by the 19B DiT architecture, it produces high-fidelity lip-synced videos with natural head movements. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/z-image/base-lora
lora-support

lora-support

wavespeed-ai/z-image/base-lora

Z-Image-Base LoRA (6B) enables high-quality text-to-image generation with full CFG support and external LoRA support. Supports negative prompting while applying up to 3 LoRAs for custom styles. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/z-image/base
text-to-image

text-to-image

wavespeed-ai/z-image/base

Z-Image-Base is a 6 billion-parameter text-to-image model with full CFG support. Supports negative prompting and fine-tuning capabilities for maximum control over image generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen3-tts/voice-design
text-to-audio

text-to-audio

wavespeed-ai/qwen3-tts/voice-design

Qwen3 TTS Voice Design: Generate speech with custom voice characteristics described in natural language. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/qwen3-tts/voice-clone
audio-to-audio

audio-to-audio

wavespeed-ai/qwen3-tts/voice-clone

Qwen3 TTS Voice Clone: Clone any voice from a reference audio and generate speech in that voice. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/qwen3-tts/text-to-speech
text-to-audio

text-to-audio

wavespeed-ai/qwen3-tts/text-to-speech

Qwen3 TTS: Multi-language, multi-voice text-to-speech synthesis with style control. Supports 11 languages and 9 voice characters. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

pixverse/pixverse-v5.6/text-to-video
text-to-video

text-to-video

pixverse/pixverse-v5.6/text-to-video

PixVerse V5.6 transforms text prompts into realistic videos with smooth motion and natural detail in seconds—ideal for stories, ads, and social clips. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

pixverse/pixverse-v5.6/image-to-video
image-to-video

image-to-video

pixverse/pixverse-v5.6/image-to-video

PixVerse V5.6 Image-to-Video turns a single image into cinematic clips with smooth motion, clean detail, and strong subject fidelity—ideal for logo stingers, character motion, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

minimax/speech-2.8-hd
text-to-audio

text-to-audio

minimax/speech-2.8-hd

MiniMax Speech 2.8 HD is a high-definition text-to-speech model with natural and expressive voice synthesis for premium audio quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

minimax/speech-2.8-turbo
text-to-audio

text-to-audio

minimax/speech-2.8-turbo

MiniMax Speech 2.8 Turbo is a high-definition text-to-speech model with natural and expressive voice synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ic-light
image-to-image

image-to-image

wavespeed-ai/ic-light

IC-Light V2 is an AI-powered image relighting model. Relight any image with customizable lighting direction. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/meshy6/text-to-3d
text-to-3d

text-to-3d

wavespeed-ai/meshy6/text-to-3d

Meshy 6 generates high-quality 3D models from text descriptions with accurate geometry and superior texture quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/meshy6/image-to-3d
image-to-3d

image-to-3d

wavespeed-ai/meshy6/image-to-3d

Meshy 6 converts 2D images into high-quality 3D models with accurate geometry reconstruction and superior texture quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/sam3-image-rle
image-to-text

image-to-text

wavespeed-ai/sam3-image-rle

SAM 3 RLE is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Returns RLE (Run-Length Encoding) encoded masks for efficient storage and processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/sam3-video-rle
video-to-text

video-to-text

wavespeed-ai/sam3-video-rle

SAM 3 Video RLE is a unified foundation model for prompt-based segmentation in video. Track and segment objects across frames using text, points, or boxes, returning RLE encoded masks for efficient processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image/edit-multiple-angles
image-to-image

image-to-image

wavespeed-ai/qwen-image/edit-multiple-angles

Generate specific camera angles from a single image using a 96-pose camera system. Control horizontal rotation, vertical tilt, and zoom to create front, side, back views and more. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

minimax/image-01/text-to-image
text-to-image

text-to-image

minimax/image-01/text-to-image

MiniMax Image-01 text-to-image model generates high-quality images from text descriptions. Create diverse visuals across multiple styles and scenarios with natural language prompts. Supports multiple aspect ratios and custom dimensions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

minimax/image-01/image-to-image
image-to-image

image-to-image

minimax/image-01/image-to-image

MiniMax Image-01 image-to-image model transforms existing images using text prompts. Generate variations, apply style transfers, or modify images with character references. Supports multiple aspect ratios and custom dimensions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

alibaba/wan-2.6/image-to-video-flash
image-to-video

image-to-video

alibaba/wan-2.6/image-to-video-flash

Alibaba WAN 2.6 Flash converts images into videos (720p/1080p) with optional audio, optimized for speed and cost. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

topaz/image/sharpen
image-to-image

image-to-image

topaz/image/sharpen

Topaz Image Sharpen brings clarity and crisp definition to soft or out-of-focus images. Perfect for fixing lens blur, motion blur, and general softness. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

topaz/image/restore
image-to-image

image-to-image

topaz/image/restore

Topaz Image Restore enhances older and poorer quality photos through restoration. Remove dust, scratches, and damage from vintage photos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

topaz/image/lighting
image-to-image

image-to-image

topaz/image/lighting

Topaz Image Lighting adjusts and balances images to improve quality despite sub-optimal lighting. Fix exposure, white balance, and color temperature. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

topaz/image/denoise
image-to-image

image-to-image

topaz/image/denoise

Topaz Image Denoise removes grain and high-ISO noise while preserving detail. Perfect for low-light photography and high-ISO images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

bria/fibo/edit
image-to-image

image-to-image

bria/fibo/edit

FIBO is an open-source JSON-native image-to-image model that maps intent to structured controls for precise enterprise image generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/flux-2-klein-4b/edit-lora
lora-support

lora-support

wavespeed-ai/flux-2-klein-4b/edit-lora

FLUX.2 [klein] 4B Edit with LoRA support enables precise image-to-image editing with natural language instructions, multi-reference support, and LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/flux-2-klein-4b/text-to-image-lora
lora-support

lora-support

wavespeed-ai/flux-2-klein-4b/text-to-image-lora

FLUX.2 [klein] 4B with LoRA support is a compact 4-billion-parameter text-to-image model that delivers fast generation with quality results and LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/flux-2-klein-9b/text-to-image-lora
lora-support

lora-support

wavespeed-ai/flux-2-klein-9b/text-to-image-lora

FLUX.2 [klein] 9B with LoRA support is a high-quality text-to-image model with 9B parameters, offering enhanced realism, crisper text generation, and fast LoRA customization. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/flux-2-klein-9b/edit-lora
lora-support

lora-support

wavespeed-ai/flux-2-klein-9b/edit-lora

FLUX.2 [klein] 9B Edit with LoRA support is a high-quality image editing model with 9B parameters, offering precise modifications using natural language instructions and personalized styles via custom LoRA adapters. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/flux-2-klein-4b/edit
image-to-image

image-to-image

wavespeed-ai/flux-2-klein-4b/edit

FLUX.2 [klein] 4B Edit enables precise image-to-image editing with natural language instructions and multi-reference support. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/flux-2-klein-9b/edit
image-to-image

image-to-image

wavespeed-ai/flux-2-klein-9b/edit

FLUX.2 [klein] 9B Edit is a high-quality image editing model with 9B parameters, offering precise modifications using natural language instructions. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/flux-2-klein-9b/text-to-image
text-to-image

text-to-image

wavespeed-ai/flux-2-klein-9b/text-to-image

FLUX.2 [klein] 9B is a high-quality text-to-image model with 9B parameters, offering enhanced realism and crisper text generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/flux-2-klein-4b/text-to-image
text-to-image

text-to-image

wavespeed-ai/flux-2-klein-4b/text-to-image

FLUX.2 [klein] 4B is a compact 4-billion-parameter text-to-image model that delivers fast generation with quality results. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

elevenlabs/dubbing
video-dubbing

video-dubbing

elevenlabs/dubbing

ElevenLabs Dubbing automatically translates and dubs video/audio content into different languages while preserving the original speakers' voices. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/sam3-image
image-to-image

image-to-image

wavespeed-ai/sam3-image

SAM 3 is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/z-image-turbo/controlnet
text-to-image

text-to-image

wavespeed-ai/z-image-turbo/controlnet

Z-Image-Turbo ControlNet generates images guided by structural control signals (depth, canny edge, pose) for precise composition control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

elevenlabs/voice-changer
audio-to-audio

audio-to-audio

elevenlabs/voice-changer

ElevenLabs Voice Changer transforms any audio into speech with a different voice while preserving the original speech patterns and timing. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ltx-2-19b/ic-lora-trainer
training

training

wavespeed-ai/ltx-2-19b/ic-lora-trainer

LTX-2 IC-LoRA Trainer lets you train custom In-Context LoRA models for video-to-video transformations, including depth/pose adapters, video restoration, and style transfer. Upload a ZIP file containing paired videos to start. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2-19b/video-lora-trainer
training

training

wavespeed-ai/ltx-2-19b/video-lora-trainer

LTX-2 Audio-Video LoRA Trainer lets you train custom LoRA models with synchronized audio-video generation support. Train action, motion, and video effect models by uploading a ZIP file containing videos with optional audio. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

z-ai/glm-image/edit
image-to-image

image-to-image

z-ai/glm-image/edit

GLM-Image Edit is a powerful image-to-image editing model that transforms images based on text prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

z-ai/glm-image/text-to-image
text-to-image

text-to-image

z-ai/glm-image/text-to-image

Z-AI GLM Image generates high-quality images from text prompts, with enhanced understanding of user descriptions, resulting in images that are more precise and personal. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ltx-2-19b/video-upscaler
upscaler

upscaler

wavespeed-ai/ltx-2-19b/video-upscaler

LTX-2 19B Video Upscaler converts low-resolution videos into crisp 4K footage with seamless motion dynamics and frame consistency. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ltx-2-19b/control
motion-control

motion-control

wavespeed-ai/ltx-2-19b/control

LTX-2 19B ControlNet generates synchronized audio-video (up to 20s) from video input with pose, depth, or canny edge guidance. Supports audio preservation, generation, or removal for flexible video transformation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/ltx-2-19b/image-to-video-lora
lora-support

lora-support

wavespeed-ai/ltx-2-19b/image-to-video-lora

LTX-2 19b Image-to-Video LoRA is the first DiT-based audio-video foundation model with synchronized audio and video generation. This LoRA version supports custom style adapters for personalized video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2-19b/text-to-video-lora
lora-support

lora-support

wavespeed-ai/ltx-2-19b/text-to-video-lora

LTX-2 19b Text-to-Video LoRA is the first DiT-based audio-video foundation model with synchronized audio and video generation. This LoRA version supports custom style adapters for personalized video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2-19b/text-to-video
text-to-video

text-to-video

wavespeed-ai/ltx-2-19b/text-to-video

LTX-2 19b is the first DiT-based audio-video foundation model with synchronized audio and video, high fidelity, multiple performance modes, and production-ready outputs in one model. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/ltx-2-19b/image-to-video
image-to-video

image-to-video

wavespeed-ai/ltx-2-19b/image-to-video

LTX-2 19b is the first DiT-based audio-video foundation model with synchronized audio and video, high fidelity, multiple performance modes, and production-ready outputs in one model. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image-2512-lora-trainer
training

training

wavespeed-ai/qwen-image-2512-lora-trainer

Qwen-Image-2512 LoRA Trainer lets you train custom LoRA models 10x faster with style, character, and object training. From concept to model in minutes, not hours—upload a ZIP file containing images to start. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/z-image-turbo/image-to-image-lora
lora-support

lora-support

wavespeed-ai/z-image-turbo/image-to-image-lora

Z-Image-Turbo Image-to-Image LoRA transforms reference images with custom LoRA styles in sub-second time. Apply up to 3 LoRAs for personalized image transformation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/z-image-turbo/image-to-image
image-to-image

image-to-image

wavespeed-ai/z-image-turbo/image-to-image

Z-Image-Turbo Image-to-Image is a 6 billion parameter model that enhances the quality of reference images (similar to upscaling) in sub-second time. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

z-ai/cogview-4
text-to-image

text-to-image

z-ai/cogview-4

Z-AI CogView-4 generates high-quality images from text prompts with a quick and accurate understanding of user descriptions, letting AI express images more precisely and personally. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/qwen-image/text-to-image-2512-lora
lora-support

lora-support

wavespeed-ai/qwen-image/text-to-image-2512-lora

Qwen-Image-2512 LoRA is an enhanced 20B MMDiT text-to-image model with LoRA support for fast customization and refined image generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

luma/modify-video
video-to-video

video-to-video

luma/modify-video

Luma Modify Video transforms and restyles existing videos with AI-powered modifications, enabling style changes, visual enhancements, and creative edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image/text-to-image-2512
text-to-image

text-to-image

wavespeed-ai/qwen-image/text-to-image-2512

Qwen Image 2512 is Alibaba Qwen's latest text-to-image model with enhanced prompt understanding, superior text rendering, and versatile aspect ratio support. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/molmo2/video-content-moderator
content-moderation

content-moderation

wavespeed-ai/molmo2/video-content-moderator

Molmo2-4B Video Content Moderator analyzes video content for safety, appropriateness, and policy compliance. Detects violence, nudity, gore, and other harmful visual content in videos using an open-source vision-language model. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/molmo2/image-content-moderator
content-moderation

content-moderation

wavespeed-ai/molmo2/image-content-moderator

Molmo2-4B Image Content Moderator: Analyze image content for safety, appropriateness, and policy compliance. Detects violence, nudity, gore, and other harmful visual content. Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.

wavespeed-ai/molmo2/text-content-moderator
content-moderation

content-moderation

wavespeed-ai/molmo2/text-content-moderator

Molmo2-4B Text Content Moderator: Analyze text content for safety, appropriateness, and policy compliance. Detects hate speech, violence, sexual content, and other harmful categories. Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.

kwaivgi/kling-v2.6-std/motion-control
motion-control

motion-control

kwaivgi/kling-v2.6-std/motion-control

Kling 2.6 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/molmo2/prompt-optimizer
llm

llm

wavespeed-ai/molmo2/prompt-optimizer

Molmo2-4B Prompt Optimizer: Enhance prompts for image and video generation with intelligent restructuring, style guidance, and context-aware improvements. Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.

wavespeed-ai/molmo2/image-qa
image-to-text

image-to-text

wavespeed-ai/molmo2/image-qa

Molmo2-4B Image QA: Answer questions about images with support for multi-image comparison (1-2 images). Open-source vision-language model. Ready-to-use REST API, no cold starts, affordable pricing.

wavespeed-ai/molmo2/video-understanding
video-to-text

video-to-text

wavespeed-ai/molmo2/video-understanding

Molmo2-4B Video Understanding: Analyze videos with specialized tasks (general, summary, analysis, counting, scene description). Open-source vision-language model with temporal understanding. Ready-to-use REST API, no cold starts, duration-based pricing.

wavespeed-ai/molmo2/video-qa
video-to-text

video-to-text

wavespeed-ai/molmo2/video-qa

Molmo2-4B Video QA: Answer questions about video content with temporal understanding. Open-source vision-language model. Ready-to-use REST API, no cold starts, duration-based pricing.

wavespeed-ai/molmo2/video-captioner
video-to-text

video-to-text

wavespeed-ai/molmo2/video-captioner

Molmo2-4B Video Captioner: Generate detailed, accurate captions for videos with customizable detail levels (low, medium, high). Open-source vision-language model with temporal understanding capabilities. Ready-to-use REST API, no cold starts, duration-based pricing.

wavespeed-ai/molmo2/image-captioner
image-to-text

image-to-text

wavespeed-ai/molmo2/image-captioner

Molmo2-4B Image Captioner: Generate detailed, accurate captions for images with customizable detail levels (low, medium, high). Open-source vision-language model with object grounding capabilities. Ready-to-use REST API, no cold starts, affordable pricing.

wavespeed-ai/paddle-ocr
image-to-text

image-to-text

wavespeed-ai/paddle-ocr

PaddleOCR-VL is an ultra-compact 0.9B parameter vision-language model for document parsing, supporting 109 languages with text, table, formula, and chart recognition in JSON or Markdown output. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.

wavespeed-ai/openai-whisper-with-video
speech-to-text

speech-to-text

wavespeed-ai/openai-whisper-with-video

OpenAI Whisper Large v3 (Video-to-Text) delivers high-accuracy multilingual transcription directly from video files, with automatic language detection and optional timestamped, subtitle-ready segments. Built for stable production use with a ready-to-use REST API, fast response, no cold starts, and predictable pricing.

wavespeed-ai/video-background-remover
ai-remover

ai-remover

wavespeed-ai/video-background-remover

WaveSpeed Video Background Remover replaces or removes video backgrounds with a custom image. Upload or paste a link to your video, then provide a background image by URL or file—clean matting, edge-aware blending, and natural compositing keep subjects realistic. Built for creator workflows and batch jobs. Ready-to-use REST inference API with fast response, no cold starts, and predictable pricing.

wavespeed-ai/kandinsky5-pro/text-to-video
text-to-video

text-to-video

wavespeed-ai/kandinsky5-pro/text-to-video

Kandinsky 5 Pro Text-to-Video turns natural-language prompts into coherent 5-second clips with strong prompt adherence and smooth motion. Choose 512p or 1024p output across common aspect ratios for social posts, ads, and concept shots. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

wavespeed-ai/kandinsky5-pro/image-to-video
image-to-video

image-to-video

wavespeed-ai/kandinsky5-pro/image-to-video

Kandinsky 5 Pro Image-to-Video turns a single image into a coherent 5-second video guided by a natural-language prompt. It preserves subject and composition while adding smooth motion and cinematic dynamics. Output at 512p or 1024p in common aspect ratios for social posts, ads, and concept previews. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

wavespeed-ai/qwen-image/edit-2511-lora
lora-support

lora-support

wavespeed-ai/qwen-image/edit-2511-lora

Qwen Image Edit 2511 LoRA is an enhanced version with custom LoRA support for personalized styles. It delivers stronger edit consistency, robust multi-person identity/pose consistency, custom LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving edits. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

wavespeed-ai/longcat-avatar
digital-human

digital-human

wavespeed-ai/longcat-avatar

LongCat Avatar produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 2 minutes, 720p tier $0.40/5s. Ready-to-use REST API, no coldstarts, affordable pricing.

wavespeed-ai/qwen-image/edit-2511
image-to-image

image-to-image

wavespeed-ai/qwen-image/edit-2511

Qwen Image Edit 2511 is a major upgrade over 2509 for real-world image editing and design. It delivers stronger edit consistency, robust multi-person identity/pose consistency, built-in LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving edits. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

bytedance/seedance-v1.5-pro/video-extend-fast
video-extend

video-extend

bytedance/seedance-v1.5-pro/video-extend-fast

Seedance 1.5 Pro Fast Video Extend turns short shots into longer clips with natural motion continuation and strong temporal consistency. Supports 4–12 s extensions, 720p/1080p output with built-in upscaling, and seed-reproducible results for shot matching. Ideal for ads, trailers, and short-drama beats. Production-ready REST API with fast response, no cold starts, and affordable pricing.

bytedance/seedance-v1.5-pro/video-extend
video-extend

video-extend

bytedance/seedance-v1.5-pro/video-extend

Seedance 1.5 Pro Video-Extend turns short video clips into longer videos with natural motion continuation, stable aesthetics, and upscaled output. It supports 4–12s duration control, multiple aspect ratios/resolutions, and seed-reproducible results—ideal for extending ad creatives and short-drama shots. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

bytedance/seedance-v1.5-pro/text-to-video-fast
text-to-video

text-to-video

bytedance/seedance-v1.5-pro/text-to-video-fast

Seedance 1.5 Pro Fast (Text-to-Video) converts text prompts into cinematic, live-action-leaning videos with strong prompt adherence, expressive yet stable motion, and consistent aesthetics. It supports 4–12s duration control, multiple aspect ratios (9:16, 1:1, 16:9), and 720p/1080p output with seed-reproducible results—ideal for ads, trailers, and short-drama beats. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.

bytedance/seedance-v1.5-pro/image-to-video-fast
image-to-video

image-to-video

bytedance/seedance-v1.5-pro/image-to-video-fast

Seedance 1.5 Pro Fast Image-to-Video transforms a single image (plus optional text prompt) into cinematic, live-action-leaning clips while preserving subject identity, composition, and first-frame fidelity. It supports 4–12s duration control, adaptive aspect ratios that follow the input image, expressive yet stable motion, and seed-based reproducibility—ideal for ad creatives and short-drama beats anchored by a strong visual. Built for stable production use with a ready-to-use REST API, no cold starts, and predictable pricing.