WaveSpeedAI Desktop is Available Now!Try it
Home/Explore/x-ai/grok-2-image
text-to-image

text-to-image

Grok 2 Image

x-ai/grok-2-image

Grok 2 Image is xAI’s latest image generation model that turns simple text prompts into sharp, photorealistic visuals in seconds. From product shots to social posts and concept art, it follows your instructions closely so you can go from idea to production-ready image with just one prompt. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

If set to true, the function will wait for the result to be generated and uploaded before returning the response. It allows you to get the result directly in the response. This property is only available through the API.
If enabled, the output will be encoded into a BASE64 string instead of a URL. This property is only available through the API.

Idle

Cinematic aerial shot of a colossal biomechanical city-ship drifting through a nebula at golden hour, intricate organic-metallic architecture covered in bioluminescent veins, massive translucent wings made of light, thousands of tiny ships swarming like fireflies, warm rim lighting against cold cosmic background, shot on 65mm IMAX film, anamorphic lens flares, insane detail, photorealistic, 8K

Your request will cost $0.07 per run.

For $1 you can run this model approximately 14 times.

One more thing::

ExamplesView all

Cinematic aerial shot of a colossal biomechanical city-ship drifting through a nebula at golden hour, intricate organic-metallic architecture covered in bioluminescent veins, massive translucent wings made of light, thousands of tiny ships swarming like fireflies, warm rim lighting against cold cosmic background, shot on 65mm IMAX film, anamorphic lens flares, insane detail, photorealistic, 8K
Hyper-realistic classical oil painting portrait of a 24-year-old East Asian woman with porcelain skin and subtle freckles, wearing 18th century European aristocratic attire with intricate lace and pearls, soft Rembrandt lighting, dramatic chiaroscuro, individual strands of hair, micro skin texture, in the style of John Singer Sargent and Bouguereau, museum quality, 8K
Cinematic aerial view of a post-apocalyptic Tokyo at sunrise, overgrown with massive glowing cherry blossom trees that emit pink particles, abandoned Shibuya crossing completely covered in petals, giant broken holographic billboards still flickering, golden rays piercing through thick fog, thousands of crows flying overhead, ultra-realistic, shot on 70mm IMAX, anamorphic lens flares, emotional masterpiece, 8K
Breathtaking underwater scene of an ancient sunken cyberpunk city, massive skyscrapers covered in coral and bioluminescent algae, schools of whales swimming between buildings, rays of sunlight piercing through the surface creating god rays, abandoned mecha lying on the seabed, ethereal beauty, hyper-realistic, national geographic style meets blade runner 2049, 8K
Dramatic panoramic view of Shanghai Bund 200 years after apocalypse, iconic skyline completely overtaken by massive glowing mushrooms and vines, Oriental Pearl Tower wrapped in bioluminescent flora, aurora borealis in the sky, abandoned ships floating in the Huangpu River covered in moss, lone figure standing on the bund, emotional and hauntingly beautiful, hyper-realistic, 8K

README

Grok 2 Image

What is Grok 2 Image?

Grok 2 Image turns a natural-language text prompt into vivid, realistic images.
It’s xAI’s flagship image generation model, tuned for marketing creatives, social posts, product visuals, concept art, and more.

In the API, you use the grok-2-image. A single request can generate multiple images, making it easy to explore variations on a single idea.

Why it looks great

  • Photorealistic, high-fidelity imagery
    Trained to produce detailed textures, convincing lighting, and sharp compositions that work well for ads, hero images, and product renders.

  • Strong prompt following
    Optimized for following descriptive prompts closely, capturing objects, layouts, and styles specified in your text while minimizing “prompt drift.”

  • Flexible visual styles
    Handles realistic photography, digital illustration, stylized artwork, and concept sketches, making it useful for storyboards, thumbnails, and creative exploration.

  • Multi-image generation in one shot
    A single request can generate up to 10 JPG images, so you can explore multiple creative directions from one prompt.

  • Competitive per-image pricing
    Images are billed per output image, keeping costs predictable for batch runs and A/B creative testing.

  • Prompt refinement under the hood
    Before reaching the image model, your text prompt can be lightly revised by a chat model to improve clarity, often leading to more accurate results without extra work on your side.

Pricing

  • Billing is based on the number of images generated.

  • Each image will cost $0.07.

How to Use

  1. Write your prompt

    • Describe the subject, scene, style, and mood, for example:
      • “ultra-wide shot of a neon city at night, rainy streets, cinematic”
      • “product photo of wireless earbuds on a marble surface, soft studio lighting”
  2. Send the generation job

    • Call the image API with model: "grok-2-image" (or grok-2-image-1212) and your prompt.
    • Optionally specify how many variations to generate (up to 10 images per request).
  3. Download or display the results

    • The API returns JPG images via URLs or encoded data, which you can save, display in an app, or feed into downstream editing/compositing tools.

Note

  • Output format:
    Images are returned in JPG format.

  • Per-job limits:

    • Up to 10 images per request
    • Additional throughput limits depend on your account/plan.
  • Prompt tips:

    • Be concrete about objects, layout, and style (e.g., “centered product on plain background”).
    • Avoid contradictory instructions in a single prompt.
    • Iterate: start simple, then gradually add details once you like the base composition.

More Image Generation Model Choices

  • Nano Banana Pro High-quality text-to-image generation from Google, suitable for product shots, concept art, and creative visuals.

  • Seedream v4.5 A versatile image generation model from ByteDance, tuned for detailed scenes, characters, and stylized compositions.

  • Kling Image O1 A flagship image model from Kwaivgi/Kuaishou’s Kling series, focused on sharp, high-fidelity visuals and strong prompt adherence.

  • Qwen Image An Alibaba Qwen-based generator hosted by WaveSpeedAI, delivering robust semantic understanding and reliable text-to-image rendering across diverse styles.

Reference