
image-to-text
Idle
{ "caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying \"DAMON\" in yellow letters." }
Votre requête coûtera $0.005 par exécution.
Pour $1 vous pouvez exécuter ce modèle environ 200 fois.
Encore une chose ::
Moondream 3 Caption is a high-performance vision-language model that automatically generates clear, descriptive, and context-aware captions for any image. It supports multiple caption lengths, enabling flexible use across social media content, dataset annotation, and creative storytelling.
Flexible Caption Length Choose from short, normal, or long captions to fit your workflow needs.
Accurate Visual Understanding Trained on large-scale, diverse visual datasets — accurately detects objects, actions, and environments.
Fast and Efficient Optimized for low-latency inference, suitable for real-time applications and batch processing.
Human-like Language Output Produces smooth, natural, and grammatically correct sentences ideal for direct use in production.
{
"image": "https://example.com/photo.jpg",
"length": "short"
}
{
"image": "https://example.com/photo.jpg",
"length": "normal"
}
{
"image": "https://example.com/photo.jpg",
"length": "long"
}
{
"caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying 'DAMON' in yellow letters."
}
caption.length parameter (short, normal, or long).