Sam3 Video
Playground
Try it on WavespeedAI!SAM3 Video is a unified foundation model for prompt-based video segmentation. Provide text, point, box, or mask prompts and the model segments and tracks targets across frames with strong temporal consistency. Supports concept-level (“segment anything with concepts”) and multi-object masks for editing, analytics, and VFX. Ready-to-use REST inference API with fast response, no cold starts, and affordable pricing.
Features
WaveSpeedAI SAM3 Video Video-to-Video
SAM3 Video (wavespeed-ai/sam3-video) is a prompt-based video segmentation and mask-guided editing model. You provide a video plus a short text instruction (and optionally enable mask application), and the model segments/targets the requested subject(s) across frames with strong temporal consistency.
It’s a practical fit for object-focused video edits like background cleanup, removing unwanted elements, or isolating subjects for downstream compositing—especially on short-to-medium clips with clear subjects.
Key capabilities
-
Prompt-based target selection (concept prompts) Identify what to edit/segment using natural language (e.g., “the woman”, “person”, “red car”) without manually drawing masks frame-by-frame.
-
Multi-object targeting in one run Track multiple object categories by listing them in the prompt (comma-separated), producing consistent targets across frames.
-
Mask-guided region control via
apply_maskToggle whether the model applies the mask to the video output for tighter, more controllable edits. -
Temporal consistency for video workflows Designed to keep results stable across frames, reducing flicker/drift compared with per-frame processing.
-
Editing-oriented use cases Works well for object removal and background cleanup when your prompt clearly specifies what should change and what should stay.
Parameters and how to use
video: (required) Input video file or a public URL.prompt: (required) Text instruction for segmentation/editing. Use commas to target multiple objects (e.g.,person, cloth).apply_mask: Whether to apply the mask to the video (boolean). Default:true.
Prompt
Write prompts like you’re describing what to target and (if applicable) what the edit intent is.
Tips:
- Prefer short, concrete nouns for targeting:
person,woman,car,dog,shirt. - For multiple targets, use comma-separated labels:
person, backpack, bicycle. - If you’re doing cleanup/removal, include keep-constraints to preserve look: “remove the person in the background, keep lighting unchanged”
Examples:
The womanperson, clothremove the person in the background, keep lighting unchanged
Media (Videos)
-
Provide
videoas either:- an uploaded file, or
- a public URL the service can fetch.
-
Pricing/processing uses a billed duration clamp of 5–600 seconds, so very short clips are billed as 5s, and very long clips are treated as 600s.
Other parameters
-
apply_masktrue: apply the model’s mask to the output video (recommended when you want tighter control over the edited region).false: run without applying the mask (useful when you want the model’s edits without explicit masking).
After you finish configuring the parameters, click Run, preview the result, and iterate if needed.
Pricing
Per-run cost depends on video duration (billed duration is clamped to 5–600 seconds), charged in 5-second units at $0.05 per 5s.
| Billed duration | Cost per run |
|---|---|
| 5s | $0.05 |
| 10s | $0.10 |
| 600s (max) | $6.00 |
Notes
- Best results come from stable footage with clear subject separation and minimal heavy motion blur.
- Turn on
apply_maskwhen you need more precise, localized control (especially in cluttered scenes). - If results drift or pick the wrong target, refine the prompt (more specific noun/descriptor) or reduce to fewer targets per run.
Related Models
- WaveSpeedAI Video Eraser – Video inpainting for removing unwanted objects/people using mask-based guidance.
- Wan 2.2 Video Edit – Text-driven video edits for changing appearance/content (e.g., clothing, attributes) on short clips.
- WaveSpeedAI Video Watermark Remover – Removes logos/captions/watermarks with temporally-aware inpainting.
- WaveSpeedAI Video Outpainter – Expands video borders for reframing/aspect-ratio changes while preserving motion coherence.
Authentication
For authentication details, please refer to the Authentication Guide.
API Endpoints
Submit Task & Query Result
# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/sam3-video" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
"apply_mask": true
}'
# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"
Parameters
Task Submission Parameters
Request Parameters
| Parameter | Type | Required | Default | Range | Description |
|---|---|---|---|---|---|
| video | string | Yes | - | Video URL for segmented | |
| prompt | string | Yes | - | Text prompt for segmentation. Use commas to track multiple objects (e.g., 'person, cloth'). | |
| apply_mask | boolean | No | true | - | Whether to apply mask to video |
Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data.id | string | Unique identifier for the prediction, Task Id |
| data.model | string | Model ID used for the prediction |
| data.outputs | array | Array of URLs to the generated content (empty when status is not completed) |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.has_nsfw_contents | array | Array of boolean values indicating NSFW detection for each output |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |
Result Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| id | string | Yes | - | Task ID |
Result Response Parameters
| Parameter | Type | Description |
|---|---|---|
| code | integer | HTTP status code (e.g., 200 for success) |
| message | string | Status message (e.g., “success”) |
| data | object | The prediction data object containing all details |
| data.id | string | Unique identifier for the prediction, the ID of the prediction to get |
| data.model | string | Model ID used for the prediction |
| data.outputs | string | Array of URLs to the generated content (empty when status is not completed). |
| data.urls | object | Object containing related API endpoints |
| data.urls.get | string | URL to retrieve the prediction result |
| data.status | string | Status of the task: created, processing, completed, or failed |
| data.created_at | string | ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”) |
| data.error | string | Error message (empty if no error occurred) |
| data.timings | object | Object containing timing details |
| data.timings.inference | integer | Inference time in milliseconds |