NVIDIA Nemotron 3 Nano Omni | Efficient Reasoning LLM

NVIDIA Nemotron-3 Nano Omni Text

NVIDIA Nemotron-3 Nano Omni Text is a lightweight text-generation model for prompt-based language understanding and response generation. Provide an English prompt, and the model can generate answers, summaries, structured outputs, explanations, and other text-based responses with controllable length and sampling behavior.

Why Choose This?

Fast text generation Generate responses quickly for chat, automation, summarization, and general language tasks.
Flexible response control Adjust max_tokens, temperature, and top_p to balance response length, determinism, and creativity.
Optional system steering Use system_prompt to guide tone, structure, formatting, or task behavior for more controlled outputs.
Reasoning mode options Choose between no_think and think depending on your preferred response mode and workflow.
Production-ready API Suitable for assistants, content tools, automation pipelines, internal workflows, and structured text generation tasks.

Parameters

Parameter	Required	Description
prompt	Yes	English text prompt sent to the model.
system_prompt	No	Optional system prompt used to steer behavior, tone, or response style.
reasoning_mode	No	Reasoning mode: `no_think` (default) or `think`.
max_tokens	No	Maximum number of tokens to generate. Default: `1024`.
temperature	No	Sampling temperature. Lower values are more deterministic. Default: `0.7`.
top_p	No	Nucleus sampling probability mass. Default: `0.95`.

How to Use

Write your prompt — describe the task, question, or output you want the model to generate.
Add a system prompt (optional) — guide the model’s role, format, or tone.
Choose reasoning mode (optional) — use no_think or think depending on your workflow.
Set generation controls (optional) — adjust max_tokens, temperature, and top_p.
Submit — run the model and review the generated response.

Example Prompt

Summarize the following product requirements into a concise executive brief with key goals, risks, and next steps.

Pricing

Billed by configured max_tokens.

Max Tokens	Cost
1000	$0.006
1024	$0.0061
2000	$0.012
4000	$0.024
8000	$0.048

Billing Rules

Pricing is based on the configured max_tokens value.
Cost is $0.006 per 1,000 max tokens.
Increasing max_tokens increases cost linearly.
prompt, system_prompt, reasoning_mode, temperature, and top_p do not change pricing directly.

Best Use Cases

Question answering — Generate direct answers to prompts and tasks.
Summarization — Condense long text into concise takeaways or structured briefs.
Content drafting — Produce outlines, rewrites, explanations, and short-form written content.
Structured generation — Generate bullet points, labeled sections, or formatted outputs with system guidance.
Internal automation — Support workflow tools, copilots, and prompt-driven backend tasks.
General language tasks — Handle classification, transformation, extraction, and text reasoning workflows.

Pro Tips

Write prompts in English for best compatibility.
Be explicit about the desired output format, such as summary, bullets, JSON-style structure, or step-by-step explanation.
Use system_prompt when you need consistent tone, role behavior, or formatting rules.
Keep temperature lower when you want more stable and deterministic results.
Increase max_tokens only when you need longer outputs, since pricing is tied to that value.
Use top_p and temperature carefully together to balance creativity and control.

Notes

prompt is the only required field.
prompt must be written in English.
Default settings include reasoning_mode = no_think, max_tokens = 1024, temperature = 0.7, and top_p = 0.95.
Pricing depends on configured max_tokens, not on other generation settings.

Related Models

NVIDIA Nemotron-3 Nano Omni Vision — Analyze images with the same Omni model family.
NVIDIA Nemotron-3 Nano Omni Video — Analyze video content with temporal understanding.
NVIDIA Nemotron-3 Nano Omni Audio — Process and understand audio inputs for multimodal workflows.

NVIDIA Nemotron 3 Nano Omni is an open, efficient reasoning model for enterprise agentic workflows, built on a 30B A3B hybrid Transformer-Mamba MoE architecture. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.

示例查看全部

相關模型

README