Moondream3 Preview Point

Playground

Moondream3 Point finds objects in images and returns precise coordinate points for computer vision tasks, enabling accurate point localization. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Moondream 3 Point is a vision-language model designed to identify and describe specific objects within an image using natural language. Instead of returning coordinates, it provides a concise textual description of the detected object, making it ideal for lightweight interactive queries and content understanding.

✨ Key Features

Locate and Describe Objects Enter a short text query (e.g., “hat”, “watch”, “phone”) and receive a natural-language description of that item in context.
Fast Single-Object Queries Optimized for fast, low-latency inference — perfect for real-time applications.
Readable Natural Output The model outputs a fluent English sentence describing the object’s appearance, position, and context.
Multilingual Understanding Capable of recognizing and describing objects in a wide range of visual scenarios.

⚙️ Example Usage

Locate & Describe “Hat”

&#123;
 "image": "https://example.com/photo.jpg",
 "prompt": "hat"
&#125;

Example Response

&#123;
 "answer": "The woman is wearing a pink baseball cap with a strap across her forehead. She is also wearing large silver hoop earrings and a pink fuzzy sweater."
&#125;

💡 Best Practices

Use concise object names (e.g., “hat”, “car”, “tree”) for more accurate detection.
For precise bounding boxes or coordinates, use:
Moondream 3 Detect — returns x_min, y_min, x_max, y_max bounding boxes.
A coordinate-enabled version of Moondream 3 Point (coming soon).
Supported formats: JPEG, PNG, WebP
Maximum image size: 10 MB

💰 Pricing

$0.001 per request
Volume and enterprise pricing available upon request.

📝 Notes

The current endpoint returns descriptive text in JSON format:

&#123;"answer": "..."&#125;

— it does not output coordinates.

For small or occluded objects, use higher-resolution input or switch to the Detect model for better spatial precision.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result

set -euo pipefail

export WAVESPEED_API_KEY="your-api-key"

REQUEST_BODY=$(cat <<'JSON'
{
  "prompt": "A cinematic ocean wave at sunrise, highly detailed",
  "image": "https://interactive-examples.mdn.mozilla.net/media/cc0-images/painted-hand-298-332.jpg"
}
JSON
)

# 1. Submit the prediction.
SUBMIT_RESPONSE=$(curl --silent --show-error --fail-with-body \
  -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/moondream3-preview/point" \
  -H "Authorization: Bearer ${WAVESPEED_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "${REQUEST_BODY}")

TASK=$(printf '%s' "${SUBMIT_RESPONSE}" | jq 'if type == "object" and has("data") then .data else . end')
PREDICTION_ID=$(printf '%s' "${TASK}" | jq -r '.id // empty')
if [ -z "${PREDICTION_ID}" ]; then
  printf 'Submission response did not contain a prediction id
' >&2
  exit 1
fi
RESULT_URL=$(printf '%s' "${TASK}" | jq -r '.urls.get // empty')
if [ -z "${RESULT_URL}" ]; then RESULT_URL="https://api.wavespeed.ai/api/v3/predictions/${PREDICTION_ID}/result"; fi

# 2. Poll until the prediction finishes.
while true; do
  RESPONSE=$(curl --silent --show-error --fail-with-body \
    "${RESULT_URL}" \
    -H "Authorization: Bearer ${WAVESPEED_API_KEY}")
  RESULT=$(printf '%s' "${RESPONSE}" | jq 'if type == "object" and has("data") then .data else . end')
  STATUS=$(printf '%s' "${RESULT}" | jq -r '.status // empty')

  case "${STATUS}" in
    completed) printf '%s\n' "${RESULT}" | jq '.outputs'; break ;;
    failed|cancelled|timeout) printf '%s\n' "${RESULT}" | jq . >&2; exit 1 ;;
    created|processing) sleep 2 ;;
    *) printf 'Unexpected status: %s
' "${STATUS}" >&2; exit 1 ;;
  esac
done

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
prompt	string	Yes		-	Object to locate in the image (e.g., 'car', 'person', 'dog').
image	string	Yes		-	Image to analyze. Provide an HTTPS URL or upload an image file.
enable_sync_mode	boolean	No	false	-	If set to `true`, the request attempts to wait for the generated result and return outputs in the same response. If the result is not ready within the sync wait window, the API can return a timeout body while the task continues processing. This option is only available via the API and is supported only by some models.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Output values, usually URL strings; some models return text strings or structured result objects (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction
data.model	string	Model ID used for the prediction
data.outputs	array<string \| object>	Array of generated outputs (empty when status is not completed). Items are usually URL strings, but may be text strings or structured result objects, depending on the model.
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to poll for the prediction result
data.status	string	Status: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Overview