
Detect, identify, and segment objects in images and videos with AI models on WaveSpeed

Moondream3 Point finds objects in images and returns precise coordinate points for computer vision tasks, enabling accurate point localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Moondream3 Detect: Precise object bounding boxes in images for accurate computer vision localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Advanced SAM 3D body generation model for creating detailed 3D human body models from images with optional mask-based segmentation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Advanced SAM 3D objects generation model for creating detailed 3D object models from images with text prompts and optional mask-based segmentation. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

SAM3 Video is a unified foundation model for prompt-based video segmentation. Provide text, point, box, or mask prompts and the model segments and tracks targets across frames with strong temporal consistency. Supports concept-level (“segment anything with concepts”) and multi-object masks for editing, analytics, and VFX. Ready-to-use REST inference API with fast response, no cold starts, and affordable pricing.

SAM 3 is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

SAM 3 Video RLE is a unified foundation model for prompt-based segmentation in video. Track and segment objects across frames using text, points, or boxes, returning RLE encoded masks for efficient processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

SAM 3 RLE is a unified foundation model for promptable image segmentation using text, points, or boxes to detect and segment objects. Returns RLE (Run-Length Encoding) encoded masks for efficient storage and processing. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Bria Embed Product seamlessly integrates product images into scene backgrounds with natural lighting and perspective matching. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

VOID Video Inpainting removes objects from videos using mask-guided inpainting. Supports quad-mask or auto-generated SAM-3 masks, optional Pass 2 refinement for temporal consistency, adjustable denoising steps, guidance scale, and temporal window size. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
通过单一 REST API 运行 Object Detection and Segmentation 系列中的任意模型。按生成计费 — 无订阅、无最低消费 — 在 99.9% 可用性的基础设施上提供行业领先的延迟。
每个 Object Detection and Segmentation 模型都有按调用计价。价格在每个模型的页面上列出 — 不收取额外的平台费。
大多数 Object Detection and Segmentation 图像模型在 2 秒内完成。视频和 3D 模型比自托管方案快数倍。
多区域故障转移和自动重试可确保您的生产流量保持在线 — 即使在供应商故障期间。
每个模型在其模型页面上都列有自己的按调用价格。我们按每次成功生成计费,没有订阅费或最低消费。
本系列中的图像模型通常在 2 秒内完成。视频和 3D 模型取决于时长和分辨率,但通常比自托管运行快数倍。
可以 — 每个账户在注册时获得 $1 的免费额度,足以在不使用信用卡的情况下试用大多数 Object Detection and Segmentation 模型。
标准账户有充足的并发任务限制。企业版计划提供自定义 RPM、更高并发和专用容量 — 详情请联系销售。