WaveSpeedAI APIWavespeed AIMoondream3 Preview Caption

Moondream3 Preview Caption

Moondream3 Preview Caption

Playground

Try it on WavespeedAI!

Generate short, normal, or long image captions to help you understand and describe visual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Features

Moondream 3 — Image Captioning

Moondream 3 Caption is a high-performance vision-language model that automatically generates clear, descriptive, and context-aware captions for any image. It supports multiple caption lengths, enabling flexible use across social media content, dataset annotation, and creative storytelling.


✨ Key Features

  • Flexible Caption Length Choose from short, normal, or long captions to fit your workflow needs.

  • Accurate Visual Understanding Trained on large-scale, diverse visual datasets — accurately detects objects, actions, and environments.

  • Fast and Efficient Optimized for low-latency inference, suitable for real-time applications and batch processing.

  • Human-like Language Output Produces smooth, natural, and grammatically correct sentences ideal for direct use in production.


⚙️ Example Usage

🔹 Short Caption

{
  "image": "https://example.com/photo.jpg",
  "length": "short"
}

🔹 Normal Caption

{
  "image": "https://example.com/photo.jpg",
  "length": "normal"
}

🔹 Long Caption

{
  "image": "https://example.com/photo.jpg",
  "length": "long"
}

🧾 Example Output

{
  "caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying 'DAMON' in yellow letters."
}

Output Explanation

  • The model returns a JSON object with a single key: caption.
  • The value is a natural-language description automatically generated from the input image.
  • The style and length of the caption depend on your chosen length parameter (short, normal, or long).

💡 Best Practices

  • Use “short” for quick summaries or thumbnail text.
  • Use “normal” for descriptive captions (recommended default).
  • Use “long” for storytelling, research annotations, or dataset labeling.
  • Supported formats: JPEG, PNG, WebP
  • Maximum image size: 10 MB

💰 Pricing

  • $0.005 per request
  • Contact WaveSpeedAI for enterprise and large-scale pricing options.

Authentication

For authentication details, please refer to the Authentication Guide.

API Endpoints

Submit Task & Query Result


# Submit the task
curl --location --request POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/moondream3-preview/caption" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}" \
--data-raw '{
    "length": "normal",
    "enable_sync_mode": true
}'

# Get the result
curl --location --request GET "https://api.wavespeed.ai/api/v3/predictions/${requestId}/result" \
--header "Authorization: Bearer ${WAVESPEED_API_KEY}"

Parameters

Task Submission Parameters

Request Parameters

ParameterTypeRequiredDefaultRangeDescription
imagestringYes-Image to be described. Provide an HTTPS URL or upload an image file.
lengthstringNonormalnormal, short, longCaption length. Options: 'short', 'normal', or 'long'.
enable_sync_modebooleanNotrue-If set to true, the function will wait for the result before returning the response. This property is only available through the API.

Response Parameters

ParameterTypeDescription
codeintegerHTTP status code (e.g., 200 for success)
messagestringStatus message (e.g., “success”)
data.idstringUnique identifier for the prediction, Task Id
data.modelstringModel ID used for the prediction
data.outputsarrayArray of URLs to the generated content (empty when status is not completed)
data.urlsobjectObject containing related API endpoints
data.urls.getstringURL to retrieve the prediction result
data.has_nsfw_contentsarrayArray of boolean values indicating NSFW detection for each output
data.statusstringStatus of the task: created, processing, completed, or failed
data.created_atstringISO timestamp of when the request was created (e.g., “2023-04-01T12:34:56.789Z”)
data.errorstringError message (empty if no error occurred)
data.timingsobjectObject containing timing details
data.timings.inferenceintegerInference time in milliseconds

Result Request Parameters

© 2025 WaveSpeedAI. All rights reserved.