Moondream3 Detect | Object Detection With Precise Bounding Boxes

Moondream 3 — Object Detection

Moondream 3 Detect is a powerful vision-language model for identifying and localizing objects within images. It uses natural language input to detect specific items and returns their bounding box coordinates with high precision — ideal for visual search, annotation, and AI-assisted labeling.

✨ Key Features

Natural Language Object Queries Simply describe what you want to detect — e.g., “person,” “car,” “dog,” “chair.”
Accurate Bounding Boxes Returns precise x_min, y_min, x_max, y_max coordinates for each detected instance.
Multi-Object Detection Supports multiple instances of the same category in one image.
Fast and Lightweight Optimized for real-time or batch detection workflows with low latency.

⚙️ Example Usage

🔹 Detect Cars

{
  "image": "https://example.com/photo.jpg",
  "prompt": "car"
}

🔹 Detect People

{
  "image": "https://example.com/photo.jpg",
  "prompt": "person"
}

🔹 Detect Any Object

{
  "image": "https://example.com/photo.jpg",
  "prompt": "bicycle"
}

📦 Output Format

Bounding boxes are returned in normalized coordinates (range 0–1):

{
  "objects": [
    {
      "x_min": 0.1556,
      "x_max": 0.6881,
      "y_min": 0.2610,
      "y_max": 0.9551
    }
  ]
}

where

(x_min, y_min) = top-left corner
(x_max, y_max) = bottom-right corner

If multiple objects are detected, all boxes appear in the "objects" array.

💡 Best Practices

Use specific, clear object names for best accuracy.
For small or distant objects, higher-resolution images improve detection.
Supported formats: JPEG, PNG, WebP
Maximum image size: 10 MB

💰 Pricing

$0.001 per request
Contact WaveSpeedAI for bulk or enterprise pricing options.

Moondream3 Detect: Precise object bounding boxes in images for accurate computer vision localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

ExamplesView all

README