Flux Controlnet Union Pro 2.0

FLUX.1 ControlNet Union Pro 2.0 is a high-performance endpoint for the FLUX.1 model with advanced ControlNet capabilities, supporting multiple control modes including Canny, Depth, Pose, and more for precise image generation and control."

Features

FLUX.1-dev-ControlNet-Union-Pro-2.0 is an enhanced unified ControlNet for FLUX.1-dev model released by Shakker Labs. This version offers significant improvements over the previous Pro version with better performance and control capabilities.

Key Improvements (vs Pro 1.0)

Smaller Model Size: Removed mode embedding for reduced memory footprint
Enhanced Control: Improved Canny and Pose control with better aesthetics
New Soft Edge Support: Added AnylineDetector-based soft edge control
Streamlined Architecture: Simplified from 12 modes to 5 optimized modes

Technical Specifications

Architecture: 6 double blocks + 0 single blocks (mode embedding removed)
Training: 300k steps on 20M high-quality general and human images
Resolution: 512x512 training resolution
Precision: BFloat16
Batch Size: 128
Learning Rate: 2e-5
Guidance Range: [1, 7] uniformly sampled
Text Drop Ratio: 0.20

Supported Control Modes

1. Canny Edge Detection

Detector: cv2.Canny
Recommended Settings: conditioning_scale=0.7, guidance_end=0.8
Use Case: Precise edge-based control for structural guidance

2. Soft Edge Detection

Detector: AnylineDetector
Recommended Settings: conditioning_scale=0.7, guidance_end=0.8
Use Case: Softer, more natural edge detection for artistic control

3. Depth Control

Detector: depth-anything
Recommended Settings: conditioning_scale=0.8, guidance_end=0.8
Use Case: 3D depth-aware image generation

4. Human Pose Control

Detector: DWPose
Recommended Settings: conditioning_scale=0.9, guidance_end=0.65
Use Case: Precise human pose and body structure control

5. Grayscale Control

Detector: cv2.cvtColor
Recommended Settings: conditioning_scale=0.9, guidance_end=0.8
Use Case: Grayscale-to-color generation with structural preservation

Usage Guidelines

Detailed Prompts: Use detailed text prompts for better stability
Multi-Condition Support: Can be combined with other ControlNets
Parameter Tuning: Adjust conditioning_scale and control_guidance_end for optimal results
Quality Input: Higher quality control images produce better results

Performance Optimizations

Our WavespeedAI implementation includes:

Memory Optimization: Efficient GPU memory management for the streamlined architecture
Pipeline Acceleration: Optimized inference pipeline leveraging the simplified model structure
Dynamic Batching: Intelligent batching for improved throughput
Model Compilation: XeLerate-powered model compilation for faster inference

Professional Applications

Architectural Visualization: Depth and edge control for building renders
Character Design: Pose control for consistent character positioning
Art Direction: Soft edge control for concept development
Photography Enhancement: Grayscale colorization and structure preservation
Digital Art Creation: Combined control modes for artistic workflows

Limitations

Control Quality Dependency: Output quality depends on control image precision
Prompt Sensitivity: Results are influenced by both control inputs and text prompts
Removed Modes: No longer supports Tile mode (removed in Pro 2.0)
Memory Requirements: Still requires significant GPU memory for high-resolution outputs

This model represents the state-of-the-art in unified ControlNet technology, offering professional-grade control with improved efficiency and quality.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/flux-controlnet-union-pro2.0" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "prompt": "A robot is giving a speech.",
    "control_image": "https://d1q70pf5vjeyhc.wavespeed.ai/media/images/1751204120011542164_h8qnjgc9.png",
    "size": "1024*1024",
    "num_inference_steps": 28,
    "guidance_scale": 3.5,
    "controlnet_conditioning_scale": 0.7,
    "control_guidance_end": 0.8,
    "seed": 0,
    "num_images": 1,
    "enable_safety_checker": true,
    "enable_base64_output": false
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	The prompt to generate an image from.
control_image	string	No	https://d1q70pf5vjeyhc.wavespeed.ai/media/images/1751204120011542164_h8qnjgc9.png	-	The URL of the control image for ControlNet guidance.
size	string	No	1024*1024	256 ~ 1536 per dimension	The size of the generated image.
num_inference_steps	integer	No	28	1 ~ 50	The number of inference steps to perform.
guidance_scale	number	No	3.5	0 ~ 20	The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt when looking for a related image to show you.
controlnet_conditioning_scale	number	No	0.7	0 ~ 2	The conditioning scale for ControlNet. Higher values make the output follow the control image more closely.
control_guidance_end	number	No	0.8	0 ~ 1	The fraction of total steps at which ControlNet guidance ends.
seed	integer	No	-	-1 ~ 2147483647	The same seed and the same prompt given to the same version of the model will output the same image every time.
num_images	integer	No	1	1 ~ 4	The number of images to generate.
enable_safety_checker	boolean	No	true	-	If set to true, the safety checker will be enabled.
enable_base64_output	boolean	No	false	-	Enable base64 encoded output.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Query Parameters

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Flux Kontext Dev LoRA Ultra Fast Wan T2V 720p