Latest news on AI image and video generation models
GPT Image 1.5 Edit is OpenAI’s image model for precise, natural-language edits. Add/remove objects, swap backgrounds, retouch faces, adjust colors/lighting, edit text/graphics, crop/resize, and apply hex color control. Ready-to-use REST inference API, best performance, no coldstarts, affordable pr
LongCat Avatar produces super-realistic, lip-synchronized long video generation with natural dynamics and consistent identity. Converts one photo + audio into audio-driven talking or singing avatar videos (Image-to-Video), up to 1 minute, 720p tier $0.30/5s. Ready-to-use REST API, no coldstarts, aff
Qwen Image Edit 2511 LoRA is an enhanced version with custom LoRA support for personalized styles. It delivers stronger edit consistency, robust multi-person identity/pose consistency, custom LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving e
Qwen Image Edit 2511 is a major upgrade over 2509 for real-world image editing and design. It delivers stronger edit consistency, robust multi-person identity/pose consistency, built-in LoRA styles, enhanced industrial/product design, and improved geometric reasoning for structure-preserving edits.
Alibaba WAN 2.6 converts text or images into videos (720p/1080p) with synced audio, faster and more affordable than Google Veo3. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
Seedance 1.5 Pro Fast Image-to-Video transforms a single image (plus optional text prompt) into cinematic, live-action-leaning clips while preserving subject identity, composition, and first-frame fidelity. It supports 4–12s duration control, adaptive aspect ratios that follow the input image, exp
Seedance 1.5 Pro Fast Video Extend turns short shots into longer clips with natural motion continuation and strong temporal consistency. Supports 4–12 s extensions, 720p/1080p output with built-in upscaling, and seed-reproducible results for shot matching. Ideal for ads, trailers, and short-drama
ByteDance Seedream 4.5 is a next-gen text-to-image model optimized for typography—crisper text rendering, stronger prompt adherence, and up to 4K output for posters and brand visuals. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
SkyReels V1 is an open-source, human-centric video foundation model fine-tuned from HunyuanVideo on ~10M high-quality film and TV clips to deliver realistic human motion and scene synthesis. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Alibaba WAN 2.6 Image-Edit turns prompts into precise photo edits—adjusting color and lighting, restyling aesthetics, replacing backgrounds, removing objects, and refining details while preserving subject identity. Built for stable, repeatable image-to-image pipelines. Ready-to-use REST API, best
FLUX 2 Max Edit delivers production-grade image-to-image editing from Black Forest Labs—apply natural-language instructions and exact hex color control for consistent, studio-quality results. Ready-to-use REST inference API, best performance, no cold starts, affordable pricing.
FLUX 2 Max from Black Forest Labs delivers production-grade text-to-image generation with enhanced realism, sharper text rendering, and native editing for reliable, repeatable results. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.