Bezczynny
{
"output": "This is a black and white photograph capturing a bustling city street scene, likely from the mid-20th century. The image is taken from an elevated perspective, looking down a wide, straight road that recedes into the distance.\n\nThe street is filled with numerous vintage automobiles, characteristic of the 1940s or 1950s, including sedans and a few convertibles. The cars are in motion and parked along both sides of the road. On the left side, a building with a sign that appears to read \"COSCO\" is visible, and a crowd of pedestrians can be seen on the sidewalk. On the right, a prominent corner building with a clock on its facade stands out. The overall atmosphere is one of a busy, active urban environment."
}$0.006za uruchomienie·~166 / $1
NVIDIA Nemotron-3 Nano Omni Vision is a multimodal vision-language model for image understanding and analysis. Upload an image, provide an English prompt, and the model generates a text response for tasks such as image description, visual question answering, scene understanding, and structured visual analysis.
Image understanding with natural-language prompts Ask questions about an image or request a description in plain English.
Flexible response control
Adjust max_tokens, temperature, and top_p to balance response length, determinism, and creativity.
Optional system steering
Use system_prompt to guide output style, role, or response constraints for more controlled behavior.
Reasoning mode options
Choose between no_think and think depending on your preferred response mode.
Production-ready API Suitable for image analysis workflows, multimodal assistants, automated review pipelines, and visual understanding tools.
| Parameter | Required | Description |
|---|---|---|
| prompt | Yes | English text prompt sent to the model. |
| image | Yes | Image URL to analyze with the model. |
| system_prompt | No | Optional system prompt used to steer behavior, tone, or response style. |
| reasoning_mode | No | Reasoning mode: no_think (default) or think. |
| max_tokens | No | Maximum number of tokens to generate. Default: 1024. |
| temperature | No | Sampling temperature. Lower values are more deterministic. Default: 0.7. |
| top_p | No | Nucleus sampling probability mass. Default: 0.95. |
no_think or think depending on your workflow.max_tokens, temperature, and top_p.Describe this image in detail, including the setting, visible objects, mood, and any notable historical or architectural details.
Billed by configured max_tokens.
| Max Tokens | Cost |
|---|---|
| 1000 | $0.006 |
| 1024 | $0.0061 |
| 2000 | $0.012 |
| 4000 | $0.024 |
| 8000 | $0.048 |
max_tokens value.max_tokens increases cost linearly.prompt, image, system_prompt, reasoning_mode, temperature, and top_p do not change pricing directly.system_prompt when you need a consistent format, such as bullet summaries, JSON-style output, or domain-specific tone.temperature lower when you want more stable and deterministic responses.max_tokens only when you need longer outputs, since pricing is tied to that value.top_p and temperature together carefully to balance diversity and control.prompt and image are required.prompt is English only.reasoning_mode = no_think, max_tokens = 1024, temperature = 0.7, and top_p = 0.95.max_tokens, not on other generation settings.