
video-to-text
Idle
Permintaan Anda akan membutuhkan $0.005 per run.
Untuk $1 Anda dapat menjalankan model ini sekitar 200 kali.
Molmo2 Video QA is a powerful video understanding model that answers questions about video content. Simply upload a video and ask anything — the model analyzes visual scenes, actions, objects, and context to deliver accurate, natural-language responses.
Built for developers and creators who need intelligent video comprehension without building complex pipelines.
Natural language understanding Ask questions in plain English about what happens in your video — no need for timestamps or frame-by-frame annotation.
Scene and action recognition Understands objects, people, movements, environments, and temporal sequences across the video.
Flexible video input Accepts video uploads or public URLs for seamless integration into existing workflows.
Fast and accurate Optimized for quick turnaround while maintaining high comprehension accuracy.
Production-ready API Ready-to-use REST endpoint with predictable per-second pricing and no cold starts.
Per-5-second billing with a 5-second minimum.
| Video Duration | Cost |
|---|---|
| Up to 5s | $0.005 |
| 10s | $0.01 |
| 30s | $0.03 |
| 60s | $0.06 |
| 120s (max) | $0.12 |