Home/Explore/AI Generation Assist Tools/wavespeed-ai/moondream3-preview/detect
vision-language

vision-language

Moondream3 Detect | Object Detection With Precise Bounding Boxes | WaveSpeedAI

wavespeed-ai/moondream3-preview/detect

Moondream3 Detect: Precise object bounding boxes in images for accurate computer vision localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

preview
If set to true, the function will wait for the result before returning the response. This property is only available through the API.

Idle

{ "objects": [ { "x_max": 0.6881352663040161, "x_min": 0.1556147336959839, "y_max": 0.9551899135112762, "y_min": 0.26160696148872375 } ] }

Your request will cost $0.001 per run.

For $1 you can run this model approximately 1000 times.

ExamplesView all

README

Moondream 3 — Object Detection

Moondream 3 Detect is a powerful vision-language model for identifying and localizing objects within images. It uses natural language input to detect specific items and returns their bounding box coordinates with high precision — ideal for visual search, annotation, and AI-assisted labeling.

✨ Key Features

  • Natural Language Object Queries Simply describe what you want to detect — e.g., “person,” “car,” “dog,” “chair.”

  • Accurate Bounding Boxes Returns precise x_min, y_min, x_max, y_max coordinates for each detected instance.

  • Multi-Object Detection Supports multiple instances of the same category in one image.

  • Fast and Lightweight Optimized for real-time or batch detection workflows with low latency.

⚙️ Example Usage

🔹 Detect Cars

{
  "image": "https://example.com/photo.jpg",
  "prompt": "car"
}

🔹 Detect People

{
  "image": "https://example.com/photo.jpg",
  "prompt": "person"
}

🔹 Detect Any Object

{
  "image": "https://example.com/photo.jpg",
  "prompt": "bicycle"
}

📦 Output Format

Bounding boxes are returned in normalized coordinates (range 0–1):

{
  "objects": [
    {
      "x_min": 0.1556,
      "x_max": 0.6881,
      "y_min": 0.2610,
      "y_max": 0.9551
    }
  ]
}

where

  • (x_min, y_min) = top-left corner
  • (x_max, y_max) = bottom-right corner

If multiple objects are detected, all boxes appear in the "objects" array.

💡 Best Practices

  • Use specific, clear object names for best accuracy.
  • For small or distant objects, higher-resolution images improve detection.
  • Supported formats: JPEG, PNG, WebP
  • Maximum image size: 10 MB

💰 Pricing

  • $0.001 per request
  • Contact WaveSpeedAI for bulk or enterprise pricing options.