FLUX Kontext Max (Text-to-Image) — wavespeed-ai/flux-kontext-max/text-to-image
FLUX Kontext Max is a high-quality text-to-image model built for prompt-faithful generation with strong composition control and cinematic detail. Write a clear scene description (subject, setting, action, camera, lighting, mood), and it produces polished, production-ready images—great for story frames, concept art, marketing visuals, and realistic or stylized illustration work.
Key capabilities
- High-fidelity text-to-image generation with strong prompt adherence
- Cinematic composition control (framing, lens feel, lighting, atmosphere)
- Handles complex scenes (multiple elements, detailed environments, coherent styling)
- Consistent output quality suitable for professional creative workflows
Pricing
$0.08 per image.
Total cost = num_images × $0.08
Example: num_images = 4 → $0.32
How to use
- Write a prompt describing the scene and desired look.
- Choose an aspect ratio for your target layout (e.g., 16:9 for cinematic, 1:1 for avatars).
- Adjust guidance_scale if you need stricter prompt following.
- Set a seed for repeatable results (optional), then generate.
Parameters
- prompt (required): The text description of what to generate
- seed: Fixed value for reproducibility; leave empty or random for variation
- guidance_scale: Higher values follow the prompt more strongly (too high may look rigid or overcooked)
- aspect_ratio: Output aspect ratio (e.g., 16:9, 1:1, 9:16)
- enable_sync_mode: Wait for generation and return results directly (API only)
Prompting guide
A reliable prompt structure:
- Subject: who/what is in the image
- Action: what is happening
- Scene: where + time + atmosphere
- Camera: framing, lens vibe, movement (if relevant)
- Lighting: key light, contrast, color temperature
- Style: realistic, film still, illustration, etc.
Example pattern:
A [shot type] of [subject] [action] in [scene]. [Lighting + mood]. [Camera/framing]. [Style cues].
Example prompts
- A cinematic close-up of a detective under flickering fluorescent lights, rain streaks on the window behind, shallow depth of field, tense atmosphere, film still look.
- Wide shot of a futuristic street market at dusk, neon reflections on wet pavement, dense crowd silhouettes, volumetric light beams, high detail, cinematic composition.
- Studio product photo of a minimalist smartwatch on a glossy black surface, softbox lighting, crisp reflections, premium advertising style, ultra clean background.
Best practices
- Keep the first sentence simple and concrete, then add constraints and style cues.
- Use camera and lighting language to steer composition more reliably than abstract adjectives.
- For consistent iteration, fix seed and adjust one variable at a time (prompt wording or guidance_scale).