Home/Explore/AI Generation Assist Tools/wavespeed-ai/moondream3-preview/query
vision-language

vision-language

Moondream3 Query | Visual Question Answering With Chain Of Thought Reasoning | WaveSpeedAI

wavespeed-ai/moondream3-preview/query

Moondream3 Query answers natural language questions on images with visual Q&A and optional chain of thought for detailed explanations. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

Hint: You can drag and drop a file or click to upload

preview
Enable chain-of-thought reasoning to get more detailed explanations.
If set to true, the function will wait for the result before returning the response. This property is only available through the API.

Idle

{ "answer": "The image shows a woman dressed in a princess costume, wearing a tiara and a necklace. She is standing in front of a building with cherry blossoms in the background. The woman is posing for the picture, looking directly at the camera. The scene evokes a sense of royalty and elegance, with the woman's attire and accessories suggesting a fairytale or fantasy setting." }

Your request will cost $0.005 per run.

For $1 you can run this model approximately 200 times.

ExamplesView all

README

Moondream 3 — Visual Question Answering (VQA)

Moondream 3 Query is an advanced vision-language model designed to understand images and answer natural-language questions about them. It combines fast inference, accurate scene understanding, and optional reasoning for visual explanation — ideal for analysis, education, and creative applications.

✨ Key Features

  • Visual Q&A Ask questions about any image — people, objects, actions, or scenes — and receive natural language answers.

  • Chain-of-Thought Reasoning Enable reasoning mode to let the model explain how it reached its conclusion, useful for analysis and debugging.

  • Accurate Visual Understanding Trained on diverse, high-quality image-text datasets for reliable recognition of complex visual contexts.

  • Fast and Lightweight Optimized for low latency and efficient inference while maintaining strong reasoning performance.

⚙️ Example Usage

🔹 Basic Query

{
  "image": "https://example.com/photo.jpg",
  "prompt": "What is the person in the image doing?"
}

🔹 Query with Reasoning

{
  "image": "https://example.com/photo.jpg",
  "prompt": "What emotions are visible in this scene?",
  "reasoning": true
}

💡 Best Practices

  • Ask clear and specific questions for higher accuracy.
  • Enable reasoning mode for tasks that require multi-step or contextual analysis.
  • Supported image formats: JPEG, PNG, WebP
  • Maximum image size: 10 MB

💰 Pricing

  • $0.005 per request
  • Volume discounts available — please contact WaveSpeedAI for enterprise or batch pricing.