Explore AI Models

HiDream-I1 is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

text-to-image

wavespeed-ai/hidream-i1-full

HiDream-I1 is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.

text-to-image

wavespeed-ai/wan-2.1/i2v-480p-lora-ultra-fast

Wan-2.1 i2v model with LoRA, generate high-quality videos with superior visual quality and motion diversity

image-to-video

hot

wavespeed-ai/wan-2.1/i2v-720p-lora

Wan-2.1 i2v model with LoRA, generate high-quality videos with superior visual quality and motion diversity

image-to-video

hot

wavespeed-ai/flux-dev-ultra-fast

Flux-dev text to image model, 12 billion parameter rectified flow transformer, ultra fast!

text-to-image

image-to-image

wavespeed-ai/wan-2.1/i2v-480p

The Wan2.1 14B model is an advanced image-to-video model that offers accelerated inference capabilities, enabling high-res video generation with high visual quality and motion diversity

image-to-video

hot

wavespeed-ai/flux-schnell-lora

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

text-to-image

hot

image-to-image

wavespeed-ai/wan-2.1/i2v-720p-lora-ultra-fast

Wan-2.1 i2v model with LoRA, generate high-quality videos with superior visual quality and motion diversity

image-to-video

hot

wavespeed-ai/wan-2.1/i2v-480p-ultra-fast

The Wan2.1 14B model is an advanced image-to-video model that offers accelerated inference capabilities, enabling high-res video generation with high visual quality and motion diversity

image-to-video

hot

wavespeed-ai/flux-kontext-pro

A state-of-the-art image editing model, Flux Kontext, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash.

image-to-image

new

wavespeed-ai/wan-2.1/t2v-480p-lora

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 480p text-to-video prowess with cutting-edge suite of video foundation models, LoRA effect added

text-to-video

wavespeed-ai/wan-2.1/i2v-720p

Wan2.1 I2V-14B model is capable of generating 720P high-definition videos from images

image-to-video

hot

wavespeed-ai/real-esrgan

Real-ESRGAN with optional face correction and adjustable upscale

image-to-image

new

wavespeed-ai/ghibli

Reimagine and transform your ordinary photos into enchanting Studio Ghibli style artwork

image-to-image

hot

wavespeed-ai/wan-2.1/t2v-480p-ultra-fast

The Wan2.1 14B model is an advanced text-to-video model that offers accelerated inference capabilities, enabling high-res video generation with high visual quality and motion diversity

text-to-video

wavespeed-ai/hunyuan-custom-ref2v-480p

HunyuanCustom, a multi-modal, conditional, and controllable generation model centered on subject consistency, built upon the Hunyuan Video generation framework. It enables the generation of subject-consistent videos conditioned on text, images, audio, and video inputs.

image-to-video

new

featured

wavespeed-ai/wan-2.1/t2v-720p

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res text-to-video prowess with cutting-edge suite of video foundation models

text-to-video

wavespeed-ai/wan-2.1/t2v-480p

The Wan2.1 14B model is an advanced text-to-video model that offers accelerated inference capabilities, enabling high-res video generation with high visual quality and motion diversity

text-to-video

kwaivgi/kling-v1.6-i2v-standard

Generate 5s videos in 720p resolution from image

image-to-video

wavespeed-ai/wan-2.1-14b-vace

VACE is an all-in-one model designed for video creation and editing. It encompasses various tasks, including reference-to-video generation (R2V), video-to-video editing (V2V), and masked video-to-video editing (MV2V), allowing users to compose these tasks freely. This functionality enables users to explore diverse possibilities and streamlines their workflows effectively, offering a range of capabilities, such as Move-Anything, Swap-Anything, Reference-Anything, Expand-Anything, Animate-Anything, and more.

image-to-video

new

wavespeed-ai/flux-kontext-max

A state-of-the-art image editing model, Flux Kontext, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash.

image-to-image

new

wavespeed-ai/video-upscaler

The Upscale Model API is a powerful tool designed to enhance the resolution and quality of videos. Whether you're working with low-resolution videos that need a boost or aiming to improve the clarity of existing footage, this API leverages advanced machine learning models to deliver high-quality, upscaled videos.

video-to-video

new

wavespeed-ai/flux-schnell

FLUX.1 [schnell] is fastest image generation model tailored for local development and personal use, a 12 billion parameter rectified flow transformer

text-to-image

hot

image-to-image

minimax/video-01

Generate 6-second videos with prompts or images (also known as Hailuo). Use the T2V-01 model to create a video with images and text.

image-to-video

text-to-video

wavespeed-ai/wan-2.1/i2v-720p-ultra-fast

Wan2.1 I2V-14B model is capable of generating 720P high-definition videos from images

image-to-video

hot

wavespeed-ai/hunyuan-video/i2v

Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from image and text descriptions.

image-to-video

wavespeed-ai/flux-kontext-pro/multi

Experimental version of FLUX.1 Kontext [pro] with multi image handling capabilities

image-to-image

new

test/test-model

Please don't use this model only for the development team's debugging purposes.

text-to-image

kwaivgi/kling-v1.6-i2v-pro

Generate 5s videos in 1080p resolution from image

image-to-video

wavespeed-ai/hunyuan-video/t2v

Hunyuan Video is an Open video generation model with high visual quality, motion diversity, text-video alignment, and generation stability. This endpoint generates videos from text descriptions.

text-to-video

wavespeed-ai/flux-kontext-max/multi

Experimental version of FLUX.1 Kontext [max] with multi image handling capabilities

image-to-image

wavespeed-ai/hidream-e1-full

HiDream-E1 is an image editing model built on HiDream-I1.

image-to-image

new

bytedance/seedance-v1-pro-i2v-480p

ByteDance’s Seedance 1.0 is the new SOTA video generation model—outperforming KLING 2.1 with ultra‑fast generation, superior prompt‑following, cinematic multi‑shot coherence, and unmatched motion realism.

image-to-video

wavespeed-ai/sdxl

SDXL is a text-to-image generative AI model developed by Stability AI that creates beautiful images. It is the successor to Stable Diffusion.

text-to-image

wavespeed-ai/wan-flf2v

Wan-2.1 flf2v generates dynamic videos by intelligently bridging a given first frame to a desired end frame through smooth, coherent motion sequences.

image-to-video

new

kwaivgi/kling-v1.6-t2v-standard

Generate 5s videos in 720p resolution

text-to-video

wavespeed-ai/wan-2.1/t2v-720p-lora

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 720p text-to-video prowess with cutting-edge suite of video foundation models, LoRA effect added

text-to-video

wavespeed-ai/framepack

Framepack is an efficient Image-to-video model that autoregressively generates videos.

image-to-video

new

wavespeed-ai/flux-kontext-pro/text-to-image

The FLUX.1 Kontext [pro] text-to-image delivers state-of-the-art image generation results with unprecedented prompt following, photorealistic rendering, and flawless typography.

text-to-image

new

wavespeed-ai/step1x-edit

Step1X-Edit transforms your photos with simple instructions into stunning, professional-quality edits—rivaling top proprietary tools.

text-to-image

image-to-image

new

wavespeed-ai/flux-control-lora-depth

FLUX Control LoRA Depth is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a depth map.

image-to-image

wavespeed-ai/flux-kontext-max/text-to-image

FLUX.1 Kontext [max] text-to-image is a new premium model brings maximum performance across all aspects – greatly improved prompt adherence.

text-to-image

new

wavespeed-ai/wan-2.1/v2v-480p

Inference for Wan 2.1 14B. Unleashing high-res 480p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

new

wavespeed-ai/wan-2.1/t2v-720p-ultra-fast

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res text-to-video prowess with cutting-edge suite of video foundation models

text-to-video

wavespeed-ai/instant-character

InstantCharacter creates high-quality, consistent characters from text prompts, supporting diverse poses, styles, and appearances with strong identity control.

image-to-image

image-to-IMGE

new

wavespeed-ai/wan-2.1/t2v-480p-lora-ultra-fast

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 480p text-to-video prowess with cutting-edge suite of video foundation models, LoRA effect added

text-to-video

wavespeed-ai/flux-pro-redux

FLUX.1 [pro] Redux is a high-performance endpoint for the FLUX.1 [pro] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

image-to-image

wavespeed-ai/flux-control-lora-canny

FLUX Control LoRA Canny is a high-performance endpoint that uses a control image to transfer structure to the generated image, using a Canny edge map.

image-to-image

wavespeed-ai/mmaudio-v2

MMAudio generates synchronized audio given video and/or text inputs. It can be combined with video models to get videos with audio.

video-to-video

new

wavespeed-ai/uno

An AI model that transforms input images into new ones based on text prompts, blending reference visuals with your creative directions.

image-to-image

bytedance/seedance-v1-pro-i2v-720p

image-to-video

wavespeed-ai/wan-2.1/t2v-720p-lora-ultra-fast

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 720p text-to-video prowess with cutting-edge suite of video foundation models, LoRA effect added

text-to-video

wavespeed-ai/flux-dev-fill

FLUX.1 [dev] Fill is a high-performance endpoint for the FLUX.1 [dev] model that enables rapid transformation of existing images, delivering high-quality style transfers and image modifications with the core FLUX capabilities.

image-to-image

wavespeed-ai/hunyuan3d-v2-multi-view

Generate 3D models from your images using Hunyuan 3D. A native 3D generative model enabling versatile and high-quality 3D asset creation.

image-to-3d

wavespeed-ai/imagen4

Google’s highest quality image generation model

text-to-image

new

wavespeed-ai/flux-redux-dev

Open-weight image variation model. Create new versions while preserving key elements of your original.

image-to-image

wavespeed-ai/hunyuan-custom-ref2v-720p

image-to-video

new

wavespeed-ai/magi-1-24b

MAGI-1 is a video generation model with exceptional understanding of physical interactions and cinematic prompts

image-to-video

text-to-video

new

bytedance/seedance-v1-pro-i2v-1080p

image-to-video

bytedance/seedance-v1-lite-i2v-480p

ByteDance's Seedance 1.0 Lite is an optimized video generation model offering fast generation, superior prompt‑following, and quality motion realism at an affordable price.

image-to-video

wavespeed-ai/sdxl-lora

SDXL is a text-to-image generative AI model developed by Stability AI that creates beautiful images. It is the successor to Stable Diffusion.

text-to-image

kwaivgi/kling-v2.1-i2v-pro

Kling 2.1 Pro is an advanced endpoint for the Kling 2.1 model, offering professional-grade videos with enhanced visual fidelity, precise camera movements, and dynamic motion control, perfect for cinematic storytelling.

image-to-video

new

bytedance/seedance-v1-lite-t2v-480p

ByteDance's Seedance 1.0 Lite is an optimized video generation model offering fast generation, superior prompt‑following, and quality motion realism at an affordable price.

text-to-video

wavespeed-ai/wan-2.1/v2v-720p

Inference for Wan 2.1 14B. Unleashing high-res 720p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

vidu/image-to-video-2.0

Bring your images to life by turning them into dynamic videos that capture your vision and action.

image-to-video

new

vidu/start-end-to-video-2.0

Create dynamic videos using just the first and last frame images, enhanced with text descriptions for seamless storytelling.

image-to-video

new

vidu/reference-to-video-2.0

Create videos that align with reference subjects—like characters, objects, and environments—using the world’s first Multi-Entity Consistency feature.

image-to-video

new

bytedance/seedance-v1-pro-t2v-480p

text-to-video

wavespeed-ai/wan-2.1/v2v-480p-ultra-fast

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 480p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

wavespeed-ai/wan-2.1/v2v-720p-ultra-fast

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 720p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

kwaivgi/kling-v2.0-i2v-master

Kling AI is a powerful AI Text to Video and Image to Video model family developed by Kuaishou, the company behind one of China’s largest video-sharing platforms.

text-to-image

image-to-video

new

wavespeed-ai/flux-dev-lora-trainer

A FLUX dev LoRA trainer for subjects and styles.

training

new

wavespeed-ai/SkyReels-V1

SkyReels V1 is the first and most advanced open-source human-centric video foundation model. By fine-tuning HunyuanVideo on O(10M) high-quality film and television clips

image-to-video

new

bytedance/seedance-v1-pro-t2v-1080p

Generate 5s or 10s videos in 1080p resolution from text using bytedance Seedance Pro T2V model

text-to-video

wavespeed-ai/wan-2.1/v2v-720p-lora-ultra-fast

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 720p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

wavespeed-ai/dia-tts

Dia directly generates realistic dialogue from transcripts. Audio conditioning enables emotion control. Produces natural nonverbals like laughter and throat clearing. will cost $0.04 per 1000 character.

text-to-audio

wavespeed-ai/wan-2.1/v2v-480p-lora-ultra-fast

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 720p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

wavespeed-ai/wan-2.1/v2v-720p-lora

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 720p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

bytedance/seedance-v1-pro-t2v-720p

text-to-video

wavespeed-ai/wan-14b-trainer

To train a WAN Lora, you need at least 10 images to achieve good results. The trainer outputs a Lora URL, which is a temporary storage URL that is valid for 7 days. You should download the Lora to your own storage.

training

new

wavespeed-ai/wan-2.1/v2v-480p-lora

Turbo-charged inference for Wan 2.1 14B. Unleashing high-res 480p video-to-video prowess with cutting-edge suite of video foundation models

video-to-video

kwaivgi/kling-v2.1-i2v-standard

Kling 2.1 Standard is a cost-efficient endpoint for the Kling 2.1 model, delivering high-quality image-to-video generation

image-to-video

new

image-to-image

bytedance/seedance-v1-lite-t2v-720p

ByteDance's Seedance 1.0 Lite is an optimized video generation model offering fast generation, superior prompt‑following, and quality motion realism at an affordable price.

text-to-video

bytedance/seedance-v1-lite-i2v-1080p

ByteDance's Seedance 1.0 Lite is an optimized video generation model offering fast generation, superior prompt‑following, and quality motion realism at an affordable price.

image-to-video

bytedance/seedance-v1-lite-i2v-720p

ByteDance's Seedance 1.0 Lite is an optimized video generation model offering fast generation, superior prompt‑following, and quality motion realism at an affordable price.

image-to-video

kwaivgi/kling-v2.0-t2v-master

Kling AI is a powerful AI Text to Video and Image to Video model family developed by Kuaishou, the company behind one of China’s largest video-sharing platforms.

text-to-video

new

wavespeed-ai/veo2-t2v

Veo 2 creates videos with realistic motion and high quality output. Explore different styles and find your own with extensive camera controls.

text-to-video

new

bytedance/seedance-v1-lite-t2v-1080p

ByteDance's Seedance 1.0 Lite is an optimized video generation model offering fast generation, superior prompt‑following, and quality motion realism at an affordable price.

text-to-video

google/veo3

Sound on: Google’s flagship Veo 3 text to video model, with audio

text-to-video

new

kwaivgi/kling-v2.1-i2v-master

Kling 2.1 Master: The premium endpoint for Kling 2.1, designed for top-tier image-to-video generation with unparalleled motion fluidity, cinematic visuals, and exceptional prompt precision.

image-to-video

new

wavespeed-ai/veo2-i2v

Veo 2 creates videos from images with realistic motion and very high quality output.

image-to-video

new

wavespeed-ai/ltx-video-v097/i2v-720p

Generate videos from prompts and images using LTX Video-0.9.7

image-to-video

wavespeed-ai/ltx-video-v097/i2v-480p

Generate videos from prompts and images using LTX Video-0.9.7

image-to-video