
image-to-text
Idle
{ "caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying \"DAMON\" in yellow letters." }
Sua solicitação custará $0.005 por execução.
Por $1 você pode executar este modelo aproximadamente 200 vezes.
Mais uma coisa::
Moondream 3 Caption is a high-performance vision-language model that automatically generates clear, descriptive, and context-aware captions for any image. It supports multiple caption lengths, enabling flexible use across social media content, dataset annotation, and creative storytelling.
Flexible Caption Length Choose from short, normal, or long captions to fit your workflow needs.
Accurate Visual Understanding Trained on large-scale, diverse visual datasets — accurately detects objects, actions, and environments.
Fast and Efficient Optimized for low-latency inference, suitable for real-time applications and batch processing.
Human-like Language Output Produces smooth, natural, and grammatically correct sentences ideal for direct use in production.
{
"image": "https://example.com/photo.jpg",
"length": "short"
}
{
"image": "https://example.com/photo.jpg",
"length": "normal"
}
{
"image": "https://example.com/photo.jpg",
"length": "long"
}
{
"caption": "A young woman with long, dark hair stands in front of a bar. She wears a leopard print halter top and blue jeans, accessorized with large hoop earrings. The bar features a purple backlit counter and a lit sign displaying 'DAMON' in yellow letters."
}
caption.length parameter (short, normal, or long).