Idle
Your request will cost $0.05 per run.
For $1 you can run this model approximately 20 times.
One more thing:
VOID Video Inpainting removes objects or people from video footage and fills the background with realistic, temporally consistent content. Describe what to remove and what the background should look like — the model handles the rest, with optional mask video input for precise control.
Text-driven object removal Describe the object or person to remove in plain language — no manual masking required. The model uses SAM-3 to auto-generate a mask from your text description.
Custom mask video support Upload a pre-prepared VOID-style quadmask or simple binary mask video for precise, frame-accurate removal control.
Background inpainting Describe the desired background after removal — the model fills the gap with contextually appropriate, motion-consistent content.
Pass 2 refinement Enable enable_pass2_refinement for additional warped-noise refinement that improves temporal consistency on longer clips.
Fine-grained generation control Adjust inference steps, guidance scale, denoising strength, and temporal window size for precise output control.
| Parameter | Required | Description |
|---|---|---|
| video | Yes | Input video containing the object to remove (URL). |
| prompt | Yes | Text description of the desired background after object removal. |
| mask_video | No | Mask video URL. Supports VOID quadmask (4 grayscale values) or simple binary mask. Auto-generated if omitted. |
| mask_prompt | No | Text description of what to mask/remove. Used to auto-generate a mask when mask_video is not provided. |
| enable_pass2_refinement | No | Run Pass 2 warped-noise refinement for improved temporal consistency. Slower but higher quality. Default: false. |
| negative_prompt | No | Negative prompt to guide generation away from undesired outputs. |
| num_inference_steps | No | Number of denoising steps. Range: 1–50. Default: 30. Higher = better quality, slower. |
| guidance_scale | No | Classifier-free guidance scale. Range: 0–20. Default: 1. |
| strength | No | Denoising strength. Range: 0–1. Default: 1 (full denoising). |
| num_frames | No | Temporal window size. Valid values: 69, 77, 85, …, 197. Default: 85. |
| seed | No | Random seed for reproducible results. |
The mask_video supports two formats:
If mask_video is not provided, a mask is auto-generated from mask_prompt using SAM-3.
| Pass 2 Refinement | Mask Video | Cost |
|---|---|---|
| No | No (auto) | $0.05 |
| Yes | No (auto) | $0.10 |
| No | Yes | $0.10 |
| Yes | Yes | $0.15 |