Introducing WaveSpeedAI Sam3 Video Rle on WaveSpeedAI
Introducing SAM3 Video RLE: Professional-Grade Video Segmentation with RLE-Encoded Outputs
The landscape of video segmentation has fundamentally changed. What once required teams of skilled artists spending countless hours on frame-by-frame rotoscoping can now be accomplished in seconds with AI-powered tools. Today, we’re excited to announce that SAM3 Video RLE is now available on WaveSpeedAI, bringing Meta’s groundbreaking Segment Anything Model 3 technology to your video production and computer vision workflows with optimized RLE-encoded outputs designed for programmatic processing.
What is SAM3 Video RLE?
SAM3 Video RLE is a unified foundation model for prompt-based video segmentation that combines the revolutionary capabilities of Meta’s Segment Anything Model 3 with Run-Length Encoded (RLE) output format. Released as part of Meta’s Segment Anything Collection in late 2025, SAM 3 introduced a paradigm shift in segmentation technology: the ability to detect, segment, and track objects using natural language descriptions rather than manual clicks or bounding boxes.
Unlike previous segmentation models that required you to click on each object you wanted to track, SAM3 enables Promptable Concept Segmentation (PCS)—simply describe what you’re looking for with text like “person wearing red shirt” or “all vehicles in the scene,” and the model finds and tracks every matching instance across your entire video.
The “RLE” in SAM3 Video RLE refers to the output format: Run-Length Encoding, a lossless compression method that stores segmentation masks as compact data rather than full image files. This makes it ideal for automated pipelines, computer vision applications, and any workflow where you need programmatic access to frame-by-frame mask data.
Key Features
Multi-Modal Prompting
- Text prompts: Describe objects naturally—“the person in the blue jacket,” “all cars,” “dog playing in the park”
- Point prompts: Click coordinates to identify specific targets
- Box prompts: Draw bounding boxes for precise object selection
- Combined prompts: Mix text, points, and boxes for maximum accuracy
Multi-Object Tracking
Track multiple objects simultaneously using comma-separated prompts. Need to segment “person, car, dog” in the same video? Simply list them all, and SAM3 handles each independently while maintaining consistent identity across frames.
Efficient RLE Output
RLE encoding scales with the number of object boundaries rather than image dimensions. For video segmentation where objects typically form contiguous regions, this results in dramatically smaller file sizes compared to raw masks—perfect for processing long videos or integrating with downstream systems.
Built-in Prompt Enhancer
Not sure how to describe what you’re looking for? The integrated prompt enhancer automatically improves your text descriptions for better segmentation results.
Optional Mask Visualization
Toggle the apply_mask parameter to preview segmentation overlays directly on your video, making it easy to validate results before committing to full processing.
Practical Use Cases
Video Annotation and Training Data Generation
Creating high-quality training datasets for machine learning is notoriously time-consuming. SAM3 Video RLE transforms this workflow by generating frame-by-frame segmentation masks automatically. The RLE format is directly compatible with popular ML frameworks and annotation tools like CVAT, which has already integrated SAM 3 for streamlined labeling workflows. What previously required extensive manual annotation can now be pre-labeled in seconds, with human reviewers focusing only on quality control and edge cases.
VFX and Rotoscoping
The VFX industry has been revolutionized by SAM 3’s capabilities. Traditional rotoscoping—the painstaking process of manually tracing subjects frame by frame—has been fundamentally disrupted. Demonstrations have shown that tasks that once required “a team of dozens of people” now take “seconds” with AI-assisted segmentation. VFX artists can use SAM3 Video RLE to generate masks for compositing, apply effects to isolated subjects, or remove backgrounds through complex motion sequences.
Automated Video Processing Pipelines
For developers building video processing systems, RLE-encoded masks integrate seamlessly into automated workflows. The JSON output format works directly with pycocotools and similar libraries:
from pycocotools import mask as mask_utils
rle_data = {"counts": "146301 3 147834 11 ...", "size": [height, width]}
binary_mask = mask_utils.decode(rle_data) # Returns numpy array
Sports Analytics and Surveillance
Track players, vehicles, or any objects of interest across frames while maintaining unique identities. The temporal consistency of SAM 3’s tracking handles occlusions, crowded scenes, and appearance changes that challenge traditional tracking systems.
Robotics and AR/VR Applications
Real-time scene understanding for robotics perception, augmented reality overlays, and virtual environment interaction all benefit from fast, accurate segmentation with programmatic output.
Getting Started with WaveSpeedAI
Using SAM3 Video RLE on WaveSpeedAI is straightforward. Simply upload your video and describe what you want to segment:
import wavespeed
output = wavespeed.run(
"wavespeed-ai/sam3-video-rle",
{
"video": "https://your-video-url.com/video.mp4",
"prompt": "person, car"
}
)
# Output contains RLE-encoded masks for each frame
print(output["outputs"])
For more precise control, add point or box prompts to guide the segmentation:
output = wavespeed.run(
"wavespeed-ai/sam3-video-rle",
{
"video": "https://your-video-url.com/video.mp4",
"prompt": "the main subject",
"point_prompts": [[512, 384]],
"apply_mask": True
}
)
Pricing That Makes Sense
WaveSpeedAI offers transparent, usage-based pricing for SAM3 Video RLE:
| Duration | Cost |
|---|---|
| Per 5 seconds | $0.05 |
| 1 minute | $0.60 |
| 5 minutes | $3.00 |
| 10 minutes | $6.00 |
Videos are billed in 5-second increments with a maximum duration of 10 minutes per job. For longer content, simply split into segments and process separately.
Why WaveSpeedAI?
Running advanced video segmentation models requires significant computational resources. WaveSpeedAI removes these barriers with:
- No cold starts: Your jobs begin processing immediately, without waiting for model initialization
- Optimized inference: We’ve tuned SAM3 for maximum throughput without sacrificing quality
- Simple REST API: Integrate video segmentation into any application with a few lines of code
- Affordable pricing: Pay only for what you use, with no upfront commitments
Start Segmenting Today
SAM3 Video RLE represents a fundamental leap forward in video segmentation technology. Whether you’re generating training data for computer vision models, automating VFX workflows, or building the next generation of video understanding applications, this model delivers professional-grade results with unprecedented ease.
Ready to transform your video workflows? Try SAM3 Video RLE on WaveSpeedAI and experience the future of video segmentation.




