Moondream3 Preview Caption

Playground

Generate short, normal, or long image captions to help you understand and describe visual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Moondream 3 — Image Captioning

Moondream 3 Caption is a high-performance vision-language model that automatically generates clear, descriptive, and context-aware captions for any image. It supports multiple caption lengths, enabling flexible use across social media content, dataset annotation, and creative storytelling.

✨ Key Features

Flexible Caption Length Choose from short, normal, or long captions to fit your workflow needs.
Accurate Visual Understanding Trained on large-scale, diverse visual datasets — accurately detects objects, actions, and environments.
Fast and Efficient Optimized for low-latency inference, suitable for real-time applications and batch processing.
Human-like Language Output Produces smooth, natural, and grammatically correct sentences ideal for direct use in production.

⚙️ Example Usage

🔹 Short Caption

&#123;
  "image": "https://example.com/photo.jpg",
  "length": "short"
&#125;

🔹 Normal Caption

&#123;
  "image": "https://example.com/photo.jpg",
  "length": "normal"
&#125;

🔹 Long Caption

&#123;
  "image": "https://example.com/photo.jpg",
  "length": "long"
&#125;

🧾 Example Output

&#123;
  "caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying 'DAMON' in yellow letters."
&#125;

Output Explanation

The model returns a JSON object with a single key: caption.
The value is a natural-language description automatically generated from the input image.
The style and length of the caption depend on your chosen length parameter (short, normal, or long).

💡 Best Practices

Use “short” for quick summaries or thumbnail text.
Use “normal” for descriptive captions (recommended default).
Use “long” for storytelling, research annotations, or dataset labeling.
Supported formats: JPEG, PNG, WebP
Maximum image size: 10 MB

💰 Pricing

$0.005 per request
Contact WaveSpeedAI for enterprise and large-scale pricing options.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/moondream3-preview/caption" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "length": "normal",
    "enable_sync_mode": true
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

Parameter	Type	Required	Default	Range	Description
image	string	Yes		-	Image to be described. Provide an HTTPS URL or upload an image file.
length	string	No	normal	normal, short, long	Caption length. Options: 'short', 'normal', or 'long'.
enable_sync_mode	boolean	No	true	-	If set to true, the function will wait for the result before returning the response. This property is only available through the API.

Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data.id	string	Unique identifier for the prediction, Task Id
data.model	string	Model ID used for the prediction
data.outputs	array	Array of URLs to the generated content (empty when status is not `completed`)
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.has_nsfw_contents	array	Array of boolean values indicating NSFW detection for each output
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Result Request Parameters

Parameter	Type	Required	Default	Description
id	string	Yes	-	Task ID

Result Response Parameters

Parameter	Type	Description
code	integer	HTTP status code (e.g., 200 for success)
message	string	Status message (e.g., “success”)
data	object	The prediction data object containing all details
data.id	string	Unique identifier for the prediction, the ID of the prediction to get
data.model	string	Model ID used for the prediction
data.outputs	string	Object containing the model output (empty when status is not completed).
data.urls	object	Object containing related API endpoints
data.urls.get	string	URL to retrieve the prediction result
data.status	string	Status of the task: `created`, `processing`, `completed`, or `failed`
data.created_at	string	ISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.error	string	Error message (empty if no error occurred)
data.timings	object	Object containing timing details
data.timings.inference	integer	Inference time in milliseconds

Molmo2 Video Understanding Moondream3 Preview Detect