text-to-audio
MiniMax Music Cover transforms existing songs into completely different styles — new arrangement, new vocal character, same melody. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
image-to-video
Seedance 2.0 (Image-to-Video Turbo) generates cinematic 720p/1080p videos from reference images —delivering high-resolution output at near-480p speed with native audio-visual synchronization, director-level control, and exceptional motion stability.
text-to-video
Seedance 2.0 (Text-to-Video Turbo) generates cinematic 720p/1080p videos from text prompts —delivering high-resolution output at near-480p speed with native audio-visual synchronization, director-level control, and exceptional motion stability.
image-to-video
Seedance 2.0 Fast (Image-to-Video Turbo) generates cinematic 720p/1080p videos from reference images using speed-optimized inference —the fastest and most affordable Seedance image-to-video option with native audio-visual synchronization and director-level control.
text-to-video
Seedance 2.0 Fast (Text-to-Video Turbo) generates cinematic 720p/1080p videos from text prompts using speed-optimized inference —the fastest and most affordable Seedance option with native audio-visual synchronization and director-level control.
audio-to-audio
OmniVoice Voice Clone clones any voice from a short 3-10 second audio sample. Supports 600+ languages with zero-shot voice cloning. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-audio
OmniVoice is a massively multilingual zero-shot TTS supporting 600+ languages. Generate speech with auto voice or design custom voices using natural language descriptions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
AI Virtual Outfit Try-On generates videos of a person wearing uploaded clothing. Upload a portrait and clothing images, add an optional prompt, and get a try-on video. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-video
AI Parkour Video generates dynamic parkour action videos from a portrait image. Choose from 6 parkour styles or provide a reference video. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-video
AI Video Ads generates product advertisement videos. Provide a person photo, product name, and optional product image or script, and AI creates a professional ad video. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-video
AI Talking Photos brings your photos to life — upload a portrait and text, and watch the person speak. Supports 5-15 seconds duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Travel Trends generates stunning travel-style photos at 30 iconic destinations worldwide. Upload a photo, write a prompt, pick a destination — Paris, Tokyo, Bali, New York, and more. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Breast Xpansion transforms portrait photos with an exaggerated breast enlargement effect. Upload a photo and get an entertaining result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Instagram Model generates stunning Instagram-style photos from your image and prompt. Choose from 10 style presets — influencer, street fashion, beach, fitness, luxury, casual chic, night glam, anime, cyberpunk, and vintage retro. Ready-to-use REST inference API, no coldstarts, affordable pricing.
text-to-audio
MiniMax Music 2.6 generates complete songs with vocals and instrumentals from text prompts and lyrics. Supports instrumental-only mode, auto lyrics generation, structure tags for song arrangement, and configurable audio quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-image
JoyAI Image Edit transforms images based on text instructions, allowing you to modify backgrounds, adjust colors, add or remove elements, and more. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
image-to-video
Vidu Q3 Reference-to-Video Mix generates multi-entity consistent videos from 1-4 reference images with text prompt guidance. Supports 360p to 1080p resolutions, up to 16 seconds duration, multiple aspect ratios, and optional audio generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-image
Ideogram V3 Generate Transparent creates high-quality images with transparent backgrounds from text prompts, perfect for logos, stickers, and design assets. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
image-to-image
Ideogram V3 Layerize Text separates flat graphic images into editable layers, extracting text and background for professional design workflows. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
digital-human
Sync Lipsync 3 synchronizes lip movements in any video to supplied audio using zero-shot lip-sync technology. Supports multiple sync modes for handling duration mismatches, works with live-action, 3D characters, and AI-generated avatars. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-text
Kling Advanced Elements creates custom AI elements from reference images or videos for consistent character and object appearance across Kling video generations. Supports multi-image elements with frontal and reference images, video character elements, and optional voice binding. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
Seedance 2.0 Fast (Image-to-Video) generates cinematic videos from reference images and text prompts with native audio-visual synchronization, director-level control, and exceptional motion stability — optimized for faster generation at lower cost. Built on Seed's unified multimodal architecture.
text-to-video
Seedance 2.0 Fast (Text-to-Video) generates cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability — optimized for faster generation at lower cost. Built on Seed's unified multimodal architecture.
image-to-video
Seedance 2.0 (Image-to-Video) generates Hollywood-grade cinematic videos from reference images and text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it preserves the input image's subject and composition while adding expressive, physically accurate motion.
text-to-video
Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.
video-extend
WAN 2.7 Video Extend extends existing videos with optional last frame control and audio support, supporting 720p/1080p output. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
video-to-video
WAN 2.7 Video Edit performs prompt-driven video editing with multi-image reference support, supporting 720p/1080p output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
Google Veo 3.1 Lite Start-End-to-Video generates high-fidelity videos by interpolating between a start image and an optional end image. Supports 720p and 1080p resolutions, landscape and portrait aspect ratios, and native audio generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-video
Google Veo 3.1 Lite Text-to-Video generates high-fidelity 720p or 1080p videos with natively generated audio from text prompts. Lightweight variant optimized for cost efficiency. Supports landscape and portrait aspect ratios, dialogue with lip-sync, and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
Google Veo 3.1 Lite Image-to-Video transforms static images into high-fidelity 720p or 1080p videos with natively generated audio. Supports many interpolation use cases, landscape and portrait aspect ratios, and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
video-effects
VACE Video Joiner seamlessly joins multiple video clips into one using AI-powered transition generation. Upload 2 to 4 videos and get a smoothly joined result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Image Face Blur automatically detects and blurs faces in images for privacy protection. Upload an image and get a result with all faces blurred. Ready-to-use REST inference API, no coldstarts, affordable pricing.
video-to-video
AI Video Converter converts videos between formats. Upload a video and specify the target format to get a converted result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
audio-to-audio
AI Audio Converter converts audio files between formats. Upload an audio file and specify the target format to get a converted result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Image Converter converts images between formats. Upload an image and specify the target format to get a converted result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
WAN 2.7 Image Edit performs prompt-driven image editing with support for multiple-image references. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-image
WAN 2.7 Image Edit Pro performs prompt-driven image editing with multi-image reference support and up to 2K output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-image
WAN 2.7 Text-to-Image Pro generates high-quality images up to 4K from text prompts with thinking mode for enhanced image quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-image
WAN 2.7 Text-to-Image generates high-quality images from text prompts with thinking mode for enhanced image quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
WAN 2.7 Reference-to-Video turns character, prop, or scene references from images or videos into new video shots with preserved identity, style, and layout plus smooth, coherent motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
image-to-video
WAN 2.7 converts images into videos (720p/1080p) with optional audio, supporting first and last frame control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
image-to-video
PixVerse V6 Transition creates smooth AI-generated video transitions between a start image and an optional end image. Supports 360p to 1080p resolutions, 1-15 second duration, multiple aspect ratios, optional audio generation, and multi-clip mode. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-video
WAN 2.7 Text-to-Video turns plain prompts into coherent, cinematic clips with crisp detail, stable motion, and strong instruction-following—great for ads, explainers, and social posts. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
video-extend
PixVerse V6 Extend continues and enhances existing video content by analyzing the ending segment and generating new frames forward. Supports 360p to 1080p resolutions, 1-15 second extension duration, optional audio generation, and multiple styles. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
PixVerse V6 generates high-quality videos from images with flexible duration (1-15s), multiple resolutions up to 1080p, and optional audio generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
text-to-video
PixVerse V6 generates high-quality videos from text prompts with flexible duration (1-15s), multiple resolutions up to 1080p, and optional audio generation. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
upscaler
Phota Enhance improves image quality and detail. Supports batch enhancement up to 4 images with JPEG, PNG, or WebP output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-image
Phota Edit transforms existing images using natural language instructions. Supports up to 10 reference images, 1K and 4K resolutions, and batch output up to 4 images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-image
Phota Text-to-Image generates high-quality personalized photographs from text prompts. Supports 1K and 4K resolutions, multiple aspect ratios, and batch generation up to 4 images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
X-AI Grok Imagine Video Reference-to-Video generates videos from multiple reference images with preserved identity, style, and scene composition. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
video-extend
X-AI Grok Imagine Video Extend turns short clips into longer videos with smooth motion continuity and natural scene extension. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-video
daVinci MagiHuman Text-to-Video API — a 15B parameter omni video generation model, the new open-source king on par with WAN 2.5. Generates high-quality AI videos from text prompts with optional audio input. Supports digital humans, talking heads, flexible aspect ratios, durations, and resolutions. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
daVinci MagiHuman Image-to-Video API — a 15B parameter omni video generation model, the new open-source king on par with WAN 2.5. Generates high-quality AI videos from reference images with optional audio input. Supports digital humans, talking heads, and general video generation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-audio
Google Lyria 3 Clip generates novel music tracks from text prompts and optional image input. Produces complete songs with lyrics, descriptions, and audio output. Supports negative prompts and seed control for reproducible results. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-audio
Google Lyria 3 Pro generates high-quality music tracks from text prompts and optional image input. Pro tier delivers enhanced audio quality and richer compositions. Produces complete songs with lyrics, descriptions, and audio output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-image
AI Smile Filter adds a natural smile to any portrait. Upload a face photo and get an instant smiling result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Girl Filter transforms a portrait into a cute girl style. Upload a face photo and get an instant result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
video-to-audio
Mirelo SFX V1 Video-to-Audio generates synchronized sound effects from video input with text prompt guidance. Supports multiple sample generation and customizable duration. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
video-to-video
AI Video FPS Increaser doubles your video frame rate for smoother motion and better playback quality. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
image-to-image
AI Photo Colorizer automatically adds color to black-and-white photos. Upload a grayscale image and get a colorized result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
portrait-transfer
Video Body Swap replaces the body in a target video with your face. Upload a face image and a body video to get a seamless swap. Ready-to-use REST inference API, no coldstarts, affordable pricing.
portrait-transfer
Image Body Swap replaces the body in a target image with your face. Upload a face image and a body image to get a seamless swap. Ready-to-use REST inference API, no coldstarts, affordable pricing.
audio-to-audio
AI Vocal Remover separates vocals from instrumental in any audio track. Upload an audio file and choose to extract vocals or instrumental. Ready-to-use REST inference API, no coldstarts, affordable pricing.
video-to-video
RIFE Video Interpolation generates smooth intermediate frames between existing video frames for higher frame rates and smoother motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
image-to-image
AI Gender Swap transforms a portrait to show how you would look as the opposite gender. Upload a face photo and get an instant result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-video
AI Ghibli Filter Video transforms a photo into a Studio Ghibli anime style video with customizable duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Ghibli Filter transforms a photo into Studio Ghibli anime style. Upload an image and get a Ghibli-style result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Age Filter transforms a portrait to show how you would look at different ages. Upload a face photo and select a target age. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Dog Selfie generates cute dog selfie images with customizable breed, style, expression, and more. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-video
AI Dog Selfie Video generates cute dog selfie videos with customizable breed, style, expression, action, and duration. Ready-to-use REST inference API, no coldstarts, affordable pricing.
video-extend
Vidu Q2 Pro Extend Video seamlessly extends existing videos by 1-7 seconds with high-quality motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
video-extend
Vidu Q2 Turbo Extend Video seamlessly extends existing videos by 1-7 seconds with consistent motion and scene continuity. Supports optional end-frame image guidance for precise control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
AI Sketch to Video converts a sketch image into an animated video with customizable duration (5-15s). Ready-to-use REST inference API, no coldstarts, affordable pricing.
video-to-text
OpenAI Sora 2 Characters creates reusable character IDs from video references for consistent character appearance across Sora 2 generations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
digital-human
InfiniteTalk fast video-to-video multi converts a video and two audio inputs into multi-character talking or singing videos. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
digital-human
InfiniteTalk Video-to-Video Multi converts a video and two audio inputs into multi-character talking or singing videos at up to 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-text
AI Fortune Teller provides personalized fortune reading based on your birth info, with optional palm/face photo analysis. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-text
AI Math Solver analyzes a math problem from an image and provides the solution. Upload a photo of a math problem and get step-by-step answers. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Clothes Changer swaps clothing on a person using reference clothing images. Upload a portrait and up to 8 clothing images to try on. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Celebrity Look-Alike Finder analyzes a portrait and finds the closest celebrity match. Upload a face photo and discover which celebrity you resemble. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-image
AI Fat Filter transforms a portrait image into a fun, exaggerated fat version. Upload a face photo and get an entertaining result. Ready-to-use REST inference API, no coldstarts, affordable pricing.
llm
AI Story Generator creates stories from a theme or idea with customizable genre, length, perspective, audience, and format. Ready-to-use REST inference API, no coldstarts, affordable pricing.
image-to-video
OpenAI Sora 2 Pro Image-to-Video creates physics-aware, realistic videos from reference images with synchronized audio and strong steerability. Supports 720p and 1080p resolutions with durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-video
OpenAI Sora 2 Pro is a state-of-the-art text-to-video model with realistic physics, synchronized audio, and strong steerability. Supports multiple resolutions up to 1080p and durations up to 20 seconds. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
video-extend
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-image
FireRed Image Edit V1.1 enables precise image editing with natural-language instructions, supporting both English and Chinese prompts with multi-image references. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
text-to-video
WaveSpeed UGC Video Generator creates authentic, creator-style videos from text prompts and optional reference images with native audio, natural motion, and relatable aesthetics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
text-to-video
WaveSpeed Short Video Generator creates professional short-form videos from text prompts and optional reference images with native audio, smooth motion, and versatile aspect ratios. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
text-to-video
WaveSpeed TikTok Video Generator creates viral-ready videos from text prompts and optional reference images with native audio, dynamic transitions, and scroll-stopping motion. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
text-to-video
WaveSpeed Cinematic Video Generator creates Hollywood-quality videos from text prompts and optional reference images with native audio, director-level camera control, and real-world physics. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
digital-human
LTX-2.3 Lipsync generates talking head videos from audio with synchronized lip movements and natural facial expressions. Built on DiT-based architecture with improved audio-visual alignment quality. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-video
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
lora-support
LTX-2.3 with LoRA support is a DiT-based audio-video foundation model designed to generate synchronized video and audio with custom styles, motion, or likeness training. Improved audio and visual quality with enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
lora-support
LTX-2.3 with LoRA support is a DiT-based audio-video foundation model designed to generate synchronized video and audio with custom styles, motion, or likeness training. Improved audio and visual quality with enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-video
LTX-2.3 is a DiT-based audio-video foundation model designed to generate synchronized video and audio within a single model, with improved audio and visual quality as well as enhanced prompt adherence. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
text-to-image
Google Nano Banana 2 Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 option, starting at just $0.045 per image. Delivers fast text-to-image generation with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-image
Google Nano Banana 2 Edit Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 editing option, starting at just $0.045 per image. Enables fast image editing with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
image-to-image
Bria Embed Product seamlessly integrates product images into scene backgrounds with natural lighting and perspective matching. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
motion-control
Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
motion-control
Kling 3.0 Standard Motion Control transfers motion from reference videos to animate still images. Upload a character image and a motion clip (dance, action, gesture), and the model extracts the movement to generate smooth, realistic video. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.