Vidu Q3 और Q3 Pro मॉडल पर 50% छूट · केवल WaveSpeedAI | 20 मई – 2 जून

Moondream3 Preview Caption

wavespeed-ai /

Generate short, normal, or long image captions to help you understand and describe visual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

image-to-text
Input

Drag & drop करें या upload के लिए click करें

preview
If set to true, the function will wait for the result before returning the response. This property is only available through the API.

Idle

{
  "caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying \"DAMON\" in yellow letters."
}

$0.005per run·~200 / $1

Next:

ExamplesView all

Related Models

README

Moondream 3 — Image Captioning

Moondream 3 Caption is a high-performance vision-language model that automatically generates clear, descriptive, and context-aware captions for any image. It supports multiple caption lengths, enabling flexible use across social media content, dataset annotation, and creative storytelling.

✨ Key Features

  • Flexible Caption Length Choose from short, normal, or long captions to fit your workflow needs.

  • Accurate Visual Understanding Trained on large-scale, diverse visual datasets — accurately detects objects, actions, and environments.

  • Fast and Efficient Optimized for low-latency inference, suitable for real-time applications and batch processing.

  • Human-like Language Output Produces smooth, natural, and grammatically correct sentences ideal for direct use in production.

⚙️ Example Usage

🔹 Short Caption

{
 "image": "https://example.com/photo.jpg",
 "length": "short"
}

🔹 Normal Caption

{
 "image": "https://example.com/photo.jpg",
 "length": "normal"
}

🔹 Long Caption

{
 "image": "https://example.com/photo.jpg",
 "length": "long"
}

🧾 Example Output

{
 "caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying 'DAMON' in yellow letters."
}

Output Explanation

  • The model returns a JSON object with a single key: caption.
  • The value is a natural-language description automatically generated from the input image.
  • The style and length of the caption depend on your chosen length parameter (short, normal, or long).

💡 Best Practices

  • Use “short” for quick summaries or thumbnail text.
  • Use “normal” for descriptive captions (recommended default).
  • Use “long” for storytelling, research annotations, or dataset labeling.
  • Supported formats: JPEG, PNG, WebP
  • Maximum image size: 10 MB

💰 Pricing

  • $0.005 per request
  • Contact WaveSpeedAI for enterprise and large-scale pricing options.
Accessibility:This website uses AI models provided by third parties.

Moondream3 Preview Caption API — Quick start

Grab a WaveSpeedAI API key, then call POST https://api.wavespeed.ai/api/v3/wavespeed-ai/moondream3-preview/caption with your input as JSON. The endpoint returns a prediction id; poll the prediction endpoint until status flips to completed, then read the output URL from data.outputs[0]. Examples for Moondream3 Preview Caption below.

HTTP example
# Submit the prediction
curl -X POST "https://api.wavespeed.ai/api/v3/wavespeed-ai/moondream3-preview/caption" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY" \
  -d '{
    "image": "https://example.com/your-input.jpg",
    "length": "normal",
    "enable_sync_mode": false
}'

# Response includes a prediction id. Poll for the result:
curl -X GET "https://api.wavespeed.ai/api/v3/predictions/{request_id}/result" \
  -H "Authorization: Bearer $WAVESPEED_API_KEY"

# When status is "completed", read the output from data.outputs[0].
Node.js example
// npm install wavespeed
const WaveSpeed = require('wavespeed');

const client = new WaveSpeed(); // reads WAVESPEED_API_KEY from env

const result = await client.run("wavespeed-ai/moondream3-preview/caption", {
        "image": "https://example.com/your-input.jpg",
        "length": "normal",
        "enable_sync_mode": false
});

console.log(result.outputs[0]); // → URL of the generated output
Python example
# pip install wavespeed
import wavespeed

output = wavespeed.run(
    "wavespeed-ai/moondream3-preview/caption",
    {
    "image": "https://example.com/your-input.jpg",
    "length": "normal",
    "enable_sync_mode": false
}
)

print(output["outputs"][0])  # → URL of the generated output

Moondream3 Preview Caption API — Frequently asked questions

What is the Moondream3 Preview Caption API?

Moondream3 Preview Caption is a WaveSpeedAI model for AI inference, exposed as a REST API on WaveSpeedAI. Generate short, normal, or long image captions to help you understand and describe visual content. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing. You can call it programmatically or try it from the playground above.

How do I call the Moondream3 Preview Caption API?

POST your input parameters to the model's REST endpoint (shown in the API tab of this playground) with your WaveSpeedAI API key in the Authorization header. Submission returns a prediction ID; poll the prediction endpoint until status flips to "completed", then read the output URL from the result. The playground generates a ready-to-paste code sample in Python, JavaScript, or cURL for whatever inputs you've set. Full request/response shape is documented at https://wavespeed.ai/docs/docs-api/wavespeed-ai/moondream3-preview-caption.

How much does Moondream3 Preview Caption cost per run?

Moondream3 Preview Caption starts at $0.005 per run. That figure is the base price — the final charge scales with the parameters you set in the form (output size, length, count, references, or whatever knobs this model exposes), so a higher-quality or larger output costs more than a minimal one. The exact cost for your current input is shown live next to the Generate button before you submit, and the actual per-call charge is recorded on the prediction afterwards.

What inputs does Moondream3 Preview Caption accept?

Key inputs: `image`, `enable_sync_mode`, `length`. The full JSON schema (types, defaults, allowed values) is rendered above the Generate button and mirrored in the API reference at https://wavespeed.ai/docs/docs-api/wavespeed-ai/moondream3-preview-caption.

How long does Moondream3 Preview Caption take to generate?

Average end-to-end generation time on WaveSpeedAI is around 4 seconds per request — measured across recent runs. Queue time scales with global demand; live status is visible in the prediction record.

Can I use Moondream3 Preview Caption outputs commercially?

Commercial usage rights depend on the model's license, set by its provider (WaveSpeedAI). The license summary appears on the model card above; see WaveSpeedAI's Terms of Service for platform-level conditions.