99
128
52
192
140
16
6
11
38
17
70
18
44
12
36
10
5
13
4
9
1
text-to-video
Alibaba Happy Horse 1.0 (Text-to-Video) generates cinematic 720p / 1080p videos from text prompts with smooth camera movement, expressive motion, and strong prompt fidelity. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
video-extend
Alibaba Happy Horse 1.0 (Video Extend) extends existing videos with seamless AI-generated continuation, supporting 720p/1080p output. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
video-to-video
Alibaba Happy Horse 1.0 (Video Edit) performs prompt-driven video editing with multi-image reference support, supporting 720p/1080p output. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
image-to-video
Alibaba Happy Horse 1.0 (Reference-to-Video) generates new video scenes guided by reference images, maintaining consistent characters, styles, and visual identity. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
Alibaba Happy Horse 1.0 (Image-to-Video) animates a reference image into a cinematic 720p / 1080p video, optionally guided by a text prompt. Smooth camera movement and expressive, stable motion. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
image-to-image
OpenAI's GPT Image 2 Edit enables image editing from natural-language instructions with one or more reference images. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 (Image-to-Video) generates Hollywood-grade cinematic videos from reference images and text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it preserves the input image's subject and composition while adding expressive, physically accurate motion.
text-to-image
OpenAI's GPT Image 2 Text-to-Image generates high-quality images from natural-language prompts. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 (Video-Edit) edits an input video from a natural-language prompt. The reference video drives subject identity, composition, and motion while the model rewrites lighting, style, weather, environment, or specific elements as instructed. Built on ByteDance Seed's unified multimodal architecture for cinematic, motion-stable output. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 (Video-Edit Turbo) is the turbo tier for editing an input video from a natural-language prompt — faster, more affordable high-resolution output while preserving subject identity, composition, and motion. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 Fast (Video-Edit Turbo) is the fastest, cheapest turbo tier for editing an input video from a natural-language prompt — high-resolution output with optimized cost and speed. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 Fast (Video-Edit) edits an input video from a natural-language prompt at a faster, cheaper tier. Built on ByteDance Seed's unified multimodal architecture, it preserves subject identity, composition, and motion while rewriting lighting, style, weather, environment, or specific elements as instructed. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 (Video-Extend) extends an input video with a new cinematic continuation generated from its last frame and a natural-language prompt. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 Fast (Video-Extend) extends an input video with a new cinematic continuation generated from its last frame and a natural-language prompt — at the faster, cheaper Seedance 2.0 Fast tier. Ready-to-use REST API, best performance, no coldstarts, affordable pricing.
Seedance 2.0 (Text-to-Video) generates Hollywood-grade cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability. Built on Seed's unified multimodal architecture, it leads on instruction adherence, motion quality, and visual aesthetics.
Seedance 2.0 Fast (Image-to-Video) generates cinematic videos from reference images and text prompts with native audio-visual synchronization, director-level control, and exceptional motion stability — optimized for faster generation at lower cost. Built on Seed's unified multimodal architecture.
Seedance 2.0 Fast (Text-to-Video) generates cinematic videos from text prompts with native audio-visual synchronization, director-level camera and lighting control, and exceptional motion stability — optimized for faster generation at lower cost. Built on Seed's unified multimodal architecture.
Google Nano Banana Pro (Gemini 3.0 Pro Image) Edit enables image editing with 4K-capable output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Google Nano Banana 2 Edit (Gemini 3.1 Flash Image) enables advanced image editing with 4K-capable output, fast iteration, and precise instruction following. Supports text translation, localization within images, and maintains subject consistency during edits. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Google Nano Banana Pro (Gemini 3.0 Pro Image) Edit enables image editing with highres output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Google Nano Banana 2 Edit Fast (Gemini 3.1 Flash Image) is the cheapest Nano Banana 2 editing option, starting at just $0.045 per image. Enables fast image editing with 2K default output and 4K support. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Seedream 4.5 Edit preserves facial features, lighting, and color tone from reference images, delivering professional, high-fidelity edits up to 4K with strong prompt adherence. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Seedream 4.5 Edit Sequential performs multi-image editing while locking character and object identity across shots. It detects main subjects, preserves continuity, and applies controlled edits with up to 4K output. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Seedream 5.0 Lite Edit by is a state-of-the-art image editing model preserving facial features, lighting, and color tones from reference images. Features high-fidelity editing with professional quality, superior prompt adherence, and up to 4K resolution. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Seedream 5.0 Lite Edit Sequential performs multi-image editing while locking character and object identity across shots. It detects main subjects, preserves continuity, and applies controlled edits with up to 4K output. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
WAN 2.7 converts images into videos (720p/1080p) with optional audio, supporting first and last frame control. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
WAN 2.6 converts text or images into videos (720p/1080p) with synced audio, faster and more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
WAN 2.5 converts text or images into videos (480p/720p/1080p) with synced audio, faster and more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
digital-human
AI Music Video Generator transforms audio + a single photo into a full music video with cinematic camera angles, smooth transitions, and perfect lip sync. Up to 10 minutes, 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
WAN 2.7 Image Edit performs prompt-driven image editing with support for multiple-image references. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
InfiniteTalk converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 10 minutes, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, affordable pricing.
Audio-driven InfiniteTalk turns one video plus audio into realistic talking or singing videos with lip-sync in 480p or 720p. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
チームを作成