Moondream3 Point | Precise Object Point Localization For Images And Coordinates

首页/探索/Content Detection Models/wavespeed-ai/moondream3-preview/point

image-to-text

wavespeed-ai/moondream3-preview/point

Moondream3 Point finds objects in images and returns precise coordinate points for computer vision tasks, enabling accurate point localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

文档

Input

Enable Safety Checker

Idle

{
  "answer": "The woman is wearing a pink baseball cap with a strap across her forehead. She is also wearing large silver hoop earrings and a pink fuzzy sweater. Her blonde hair is styled in loose waves, and she has her tongue sticking out slightly while looking directly at the camera. Behind her, there are several posters visible, including one with a pink background and an image of a cup."
}

启用图像放大

您的请求将花费 $0.001 每次运行。

使用 $1 您可以运行此模型大约 1000 次。

还有一件事：

示例查看全部

README

Moondream 3 — Point (Locate & Describe)

Moondream 3 Point is a vision-language model designed to identify and describe specific objects within an image using natural language. Instead of returning coordinates, it provides a concise textual description of the detected object, making it ideal for lightweight interactive queries and content understanding.

✨ Key Features

Locate and Describe Objects Enter a short text query (e.g., “hat”, “watch”, “phone”) and receive a natural-language description of that item in context.
Fast Single-Object Queries Optimized for fast, low-latency inference — perfect for real-time applications.
Readable Natural Output The model outputs a fluent English sentence describing the object’s appearance, position, and context.
Multilingual Understanding Capable of recognizing and describing objects in a wide range of visual scenarios.

⚙️ Example Usage

Locate & Describe “Hat”

{
  "image": "https://example.com/photo.jpg",
  "prompt": "hat"
}

Example Response

{
  "answer": "The woman is wearing a pink baseball cap with a strap across her forehead. She is also wearing large silver hoop earrings and a pink fuzzy sweater."
}

💡 Best Practices

Use concise object names (e.g., “hat”, “car”, “tree”) for more accurate detection.
For precise bounding boxes or coordinates, use:
- Moondream 3 Detect — returns x_min, y_min, x_max, y_max bounding boxes.
- A coordinate-enabled version of Moondream 3 Point (coming soon).
Supported formats: JPEG, PNG, WebP
Maximum image size: 10 MB

💰 Pricing

$0.001 per request
Volume and enterprise pricing available upon request.

📝 Notes

The current endpoint returns descriptive text in JSON format:
```
{"answer": "..."}
```
— it does not output coordinates.
For small or occluded objects, use higher-resolution input or switch to the Detect model for better spatial precision.