Flux Controlnet Union Pro 2.0
FLUX.1 ControlNet Union Pro 2.0 is a high-performance endpoint for the FLUX.1 model with advanced ControlNet capabilities, supporting multiple control modes including Canny, Depth, Pose, and more for precise image generation and control."
Features
FLUX.1-dev-ControlNet-Union-Pro-2.0 is an enhanced unified ControlNet for FLUX.1-dev model released by Shakker Labs. This version offers significant improvements over the previous Pro version with better performance and control capabilities.
Key Improvements (vs Pro 1.0)
- Smaller Model Size: Removed mode embedding for reduced memory footprint
- Enhanced Control: Improved Canny and Pose control with better aesthetics
- New Soft Edge Support: Added AnylineDetector-based soft edge control
- Streamlined Architecture: Simplified from 12 modes to 5 optimized modes
Technical Specifications
- Architecture: 6 double blocks + 0 single blocks (mode embedding removed)
- Training: 300k steps on 20M high-quality general and human images
- Resolution: 512x512 training resolution
- Precision: BFloat16
- Batch Size: 128
- Learning Rate: 2e-5
- Guidance Range: [1, 7] uniformly sampled
- Text Drop Ratio: 0.20
Supported Control Modes
1. Canny Edge Detection
- Detector: cv2.Canny
- Recommended Settings: conditioning_scale=0.7, guidance_end=0.8
- Use Case: Precise edge-based control for structural guidance
2. Soft Edge Detection
- Detector: AnylineDetector
- Recommended Settings: conditioning_scale=0.7, guidance_end=0.8
- Use Case: Softer, more natural edge detection for artistic control
3. Depth Control
- Detector: depth-anything
- Recommended Settings: conditioning_scale=0.8, guidance_end=0.8
- Use Case: 3D depth-aware image generation
4. Human Pose Control
- Detector: DWPose
- Recommended Settings: conditioning_scale=0.9, guidance_end=0.65
- Use Case: Precise human pose and body structure control
5. Grayscale Control
- Detector: cv2.cvtColor
- Recommended Settings: conditioning_scale=0.9, guidance_end=0.8
- Use Case: Grayscale-to-color generation with structural preservation
Usage Guidelines
- Detailed Prompts: Use detailed text prompts for better stability
- Multi-Condition Support: Can be combined with other ControlNets
- Parameter Tuning: Adjust conditioning_scale and control_guidance_end for optimal results
- Quality Input: Higher quality control images produce better results
Performance Optimizations
Our WavespeedAI implementation includes:
- Memory Optimization: Efficient GPU memory management for the streamlined architecture
- Pipeline Acceleration: Optimized inference pipeline leveraging the simplified model structure
- Dynamic Batching: Intelligent batching for improved throughput
- Model Compilation: XeLerate-powered model compilation for faster inference
Professional Applications
- Architectural Visualization: Depth and edge control for building renders
- Character Design: Pose control for consistent character positioning
- Art Direction: Soft edge control for concept development
- Photography Enhancement: Grayscale colorization and structure preservation
- Digital Art Creation: Combined control modes for artistic workflows
Limitations
- Control Quality Dependency: Output quality depends on control image precision
- Prompt Sensitivity: Results are influenced by both control inputs and text prompts
- Removed Modes: No longer supports Tile mode (removed in Pro 2.0)
- Memory Requirements: Still requires significant GPU memory for high-resolution outputs
This model represents the state-of-the-art in unified ControlNet technology, offering professional-grade control with improved efficiency and quality.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/flux-controlnet-union-pro2.0" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"prompt": "A robot is giving a speech.",
"control_image": "https://d1q70pf5vjeyhc.wavespeed.ai/media/images/1751204120011542164_h8qnjgc9.png",
"size": "1024*1024",
"num_inference_steps": 28,
"guidance_scale": 3.5,
"controlnet_conditioning_scale": 0.7,
"control_guidance_end": 0.8,
"seed": 0,
"num_images": 1,
"enable_safety_checker": true,
"enable_base64_output": false
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
Parameter | Type | Required | Default | Range | Description |
---|---|---|---|---|---|
prompt | string | Yes | - | The prompt to generate an image from. | |
control_image | string | No | https://d1q70pf5vjeyhc.wavespeed.ai/media/images/1751204120011542164_h8qnjgc9.png | - | The URL of the control image for ControlNet guidance. |
size | string | No | 1024*1024 | 256 ~ 1536 per dimension | The size of the generated image. |
num_inference_steps | integer | No | 28 | 1 ~ 50 | The number of inference steps to perform. |
guidance_scale | number | No | 3.5 | 0 ~ 20 | The CFG (Classifier Free Guidance) scale is a measure of how close you want the model to stick to your prompt when looking for a related image to show you. |
controlnet_conditioning_scale | number | No | 0.7 | 0 ~ 2 | The conditioning scale for ControlNet. Higher values make the output follow the control image more closely. |
control_guidance_end | number | No | 0.8 | 0 ~ 1 | The fraction of total steps at which ControlNet guidance ends. |
seed | integer | No | - | -1 ~ 2147483647 | The same seed and the same prompt given to the same version of the model will output the same image every time. |
num_images | integer | No | 1 | 1 ~ 4 | The number of images to generate. |
enable_safety_checker | boolean | No | true | - | If set to true, the safety checker will be enabled. |
enable_base64_output | boolean | No | false | - | Enable base64 encoded output. |
Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data.id | string | Unique identifier for the prediction, Task Id |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |
Result Query Parameters
Result Request Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
id | string | Yes | - | Task ID |
Result Response Parameters
Parameter | Type | Description |
---|---|---|
code | integer | HTTP status code (e.g., 200 for success) |
message | string | Status message (e.g., “success”) |
data | object | The prediction data object containing all details |
data.id | string | Unique identifier for the prediction |
data.model | string | Model ID used for the prediction |
data.outputs | array | Array of URLs to the generated content (empty when status is not completed ) |
data.urls | object | Object containing related API endpoints |
data.urls.get | string | URL to retrieve the prediction result |
data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
data.status | string | Status of the task: created , processing , completed , or failed |
data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
data.error | string | Error message (empty if no error occurred) |
data.timings | object | Object containing timing details |
data.timings.inference | integer | Inference time in milliseconds |