
video-to-text
Idle
{ "rle_url": "https://d2p7pge43lyniu.cloudfront.net/output/2a4b36a7-77c5-4b7c-9c42-d28eef15705c-u1_b5159ba1-08a2-4e29-9dd9-66a30506fa42_rle.json", "video_url": "https://d2p7pge43lyniu.cloudfront.net/output/2a4b36a7-77c5-4b7c-9c42-d28eef15705c-u1_f517c2b2-1e6c-4c63-b432-de43b9fb2e1f.mp4" }
Your request will cost $0.05 per run.
For $1 you can run this model approximately 20 times.
SAM3 Video Segmentation RLE is an advanced video segmentation model based on Meta's Segment Anything Model 3. It tracks and segments objects across video frames and returns masks in RLE (Run-Length Encoding) format — ideal for programmatic processing, automated pipelines, and integration with downstream workflows.
Video object tracking Segment and track objects consistently across all video frames.
RLE output format Returns compact Run-Length Encoded mask data for efficient storage and processing.
Multiple prompt types Segment objects using text prompts, point prompts, box prompts, or any combination.
Multi-object tracking Track multiple objects using comma-separated prompts (e.g., "person, cloth").
Prompt Enhancer Built-in tool to automatically improve your text prompts for better results.
Optional mask visualization Toggle apply_mask to preview segmentation on the video.
| Parameter | Required | Description |
|---|---|---|
| video | Yes | Source video to segment (upload or URL) |
| prompt | Yes | Text description of the object(s) to segment |
| point_prompts | No | Point coordinates to identify the target object |
| box_prompts | No | Bounding box coordinates to identify the target object |
| apply_mask | No | Apply mask overlay to the video output |
The model returns RLE (Run-Length Encoding) data for each frame in JSON format, enabling efficient programmatic processing.
from pycocotools import mask as mask_utils
rle_data = {"counts": "146301 3 147834 11 ...", "size": [height, width]}
binary_mask = mask_utils.decode(rle_data) # Returns numpy array
| Duration | Cost |
|---|---|
| Per 5 seconds | $0.05 |
| 1 minute | $0.60 |
| 5 minutes | $3.00 |
| 10 minutes | $6.00 |