FLUX.1 Kontext Max Multi | Multi-Image Context Handling Endpoint

FLUX Kontext Max Multi — wavespeed-ai/flux-kontext-max/multi

FLUX Kontext Max Multi is a high-end multi-image model for context-rich generation and editing. Provide a text prompt plus up to 5 reference images, and the model uses them as visual grounding to improve identity consistency, style matching, and scene coherence—ideal for premium creative work where one image is not enough.

Key capabilities

Multi-image contextual generation with up to 5 reference images
Strong identity and style consistency by grounding outputs in references
Handles complex scenes and cinematic composition with high detail
Great for iterative workflows: refine results while keeping the same visual target

Typical use cases

Character consistency using multiple portraits/outfits/angles
Product and branding consistency (packaging + logo + lighting references)
Style steering with multiple exemplars (art style + texture + lighting mood)
Scene creation or recomposition guided by reference frames
High-fidelity creative direction for storyboards and marketing visuals

Pricing

$0.08 per image.

Total cost = num_images × $0.08 Example: num_images = 4 → $0.32

Inputs and outputs

Input:

prompt (required): The generation or edit instruction
images (required): Up to 5 reference images (upload or public URLs)

Output:

One or more generated images (controlled by num_images, if available in your interface)

Parameters

prompt (required): Instruction describing what to generate and how to use references
images (required): Up to 5 reference images
guidance_scale: Prompt adherence strength (higher = stricter; too high may over-constrain)
aspect_ratio: Output aspect ratio (e.g., 16:9, 1:1, 9:16)

Prompting guide (multi-reference)

Assign roles to your references to reduce ambiguity:

Template: Use image 1 for [identity]. Use image 2 for [outfit]. Use image 3 for [style]. Use image 4 for [lighting]. Use image 5 for [background/scene]. Generate [shot description]. Keep [constraints].

Example prompts

Use image 1 for the face identity, image 2 for outfit, image 3 for illustration style. Create a 16:9 cinematic medium shot in a rainy city street at night, neon reflections, shallow depth of field.
Use images 1–2 to keep the same person identity from different angles. Generate a clean studio portrait with softbox lighting, neutral background, natural skin texture.
Use image 4 for lighting mood (sunset) and image 5 for environment. Keep the subject identity from image 1 and maintain consistent color palette.

Best practices

Use high-quality references: sharp subjects, minimal occlusion, clear lighting.
Avoid conflicting references (e.g., drastically different styles) unless you explicitly say which one dominates.
Keep guidance_scale moderate; let references do most of the steering.
Pick an aspect_ratio that matches your target layout to avoid awkward cropping.

Experimental FLUX.1 Kontext [max] (multi) supports multi-image context handling for combined inputs. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

EjemplosVer todo

README