xAI's Grok Imagine is a powerful suite of AI-native image and video generation models, offering full creative control from text-to-image, image editing, to multi-modal video generation. Built on xAI's frontier reasoning capabilities, Grok Imagine delivers exceptional prompt understanding, cinematic quality, and production-ready outputs.
🎬 Grok Imagine Video — Edit, Animate & Generate
Grok Imagine Video provides three specialized endpoints for video creation: generate from text, animate images, or transform existing videos with AI-powered editing.
- Grok Imagine Video Text-to-Video — Generate high-quality videos from text prompts with strong motion coherence and cinematic framing.
- x-ai/grok-imagine-video/text-to-video
- Grok Imagine Video Image-to-Video — Bring still images to life with natural, fluid motion while preserving subject identity and composition.
- x-ai/grok-imagine-video/image-to-video
- Grok Imagine Video Edit — Transform and remix existing videos with AI-powered editing for style transfer, scene modification, and creative effects.
- x-ai/grok-imagine-video/edit-video
🖼️ Grok Imagine Image — Create & Edit
Grok's image generation models deliver stunning visuals with exceptional prompt adherence and artistic versatility.
- Grok Imagine Image Text-to-Image — Generate detailed, photorealistic or stylized images from text with superior prompt understanding.
- x-ai/grok-imagine-image/text-to-image
- Grok Imagine Image Edit — Precisely edit and refine images with controlled modifications while maintaining visual consistency.
- x-ai/grok-imagine-image/edit
- Grok 2 Image — xAI's flagship text-to-image model with frontier-level quality and creative flexibility.
- x-ai/grok-2-image
✨ Highlights
- Frontier Prompt Understanding: Powered by Grok's advanced reasoning for exceptional text comprehension and creative interpretation.
- Cinematic Video Quality: Smooth motion, consistent subjects, and professional-grade output.
- Versatile Image Generation: From photorealistic to artistic styles with precise control.
- Full Creative Pipeline: Text-to-image, image editing, and multi-modal video generation in one unified suite.
- Production-Ready: Fast inference with reliable, consistent results for commercial workflows.