Object Detection and Segmentation

Detect, identify, and segment objects in images and videos with AI models on WaveSpeed

我們的選擇

image-to-text

wavespeed-ai/moondream3-preview/point

Moondream3 Point finds objects in images and returns precise coordinate points for computer vision tasks, enabling accurate point localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

立即嘗試！查看說明文件

所有模型

10 個模型

image-to-text

wavespeed-ai/moondream3-preview/point

image-to-text

wavespeed-ai/moondream3-preview/detect

Moondream3 Detect: Precise object bounding boxes in images for accurate computer vision localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-3d

wavespeed-ai/sam-3d-body

Advanced SAM 3D body generation model for creating detailed 3D human body models from images with optional mask-based segmentation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-3d

wavespeed-ai/sam-3d-objects

Advanced SAM 3D objects generation model for creating detailed 3D object models from images with text prompts and optional mask-based segmentation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

wavespeed-ai/sam3-video

SAM3 Video is a unified foundation model for prompt-based video segmentation. Provide text, point, box, or mask prompts and the model segments and tracks targets across frames with strong temporal consistency. Supports concept-level (“segment anything with concepts”) and multi-object masks for editing, analytics, and VFX. Ready-to-use REST inference API with fast response, no cold starts, and affordable pricing.

image-to-image

wavespeed-ai/sam3-image

SAM 3 is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-text

wavespeed-ai/sam3-video-rle

SAM 3 Video RLE is a unified foundation model for prompt-based segmentation in video. Track and segment objects across frames using text, points, or boxes, returning RLE encoded masks for efficient processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-text

wavespeed-ai/sam3-image-rle

SAM 3 RLE is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Returns RLE (Run-Length Encoding) encoded masks for efficient storage and processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-image

bria/embed-product

Bria Embed Product seamlessly integrates product images into scene backgrounds with natural lighting and perspective matching. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

video-to-video

wavespeed-ai/void-video-inpainting/mask

VOID Video Inpainting removes objects from videos using mask-guided inpainting. Supports quad-mask or auto-generated SAM-3 masks, optional Pass 2 refinement for temporal consistency, adjustable denoising steps, guidance scale, and temporal window size. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.