nvidia/nemotron-3-nano-30b-a3b
Release date: 2025-12-15
262,144 context · $0.05/M input tokens · $0.20/M output tokens
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
Pay-per-use
No upfront costs, pay only for what you use
Use the following code examples to integrate with our API:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="nvidia/nemotron-3-nano-30b-a3b",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)**NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic **
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.
The model is fully open with open-weights, datasets and recipes so developers can easily customize, optimize, and deploy the model on their infrastructure for maximum privacy and security.
Note: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems.
| Specification | Value |
|---|---|
| Provider | Nvidia |
| Model Type | Large Language Model (LLM) |
| Architecture | N/A |
| Context Window | 262144 tokens |
| Max Output | tokens |
| Input | Text |
| Output | Text |
| Vision | Supported |
| Function Calling | Supported |
| Token Type | Cost per Million Tokens |
|---|---|
| Input | $0.0 |
| Output | $0.2 |
Base URL: https://llm.wavespeed.ai/v1 API Endpoint: chat/completions Model ID: nvidia/nemotron-3-nano-30b-a3b
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://llm.wavespeed.ai/v1"
)
response = client.chat.completions.create(
model="nvidia/nemotron-3-nano-30b-a3b",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
curl https://llm.wavespeed.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "nvidia/nemotron-3-nano-30b-a3b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
nvidia/nemotron-3-nano-30b-a3b
NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...
Input
$0.05 /M
Output
$0.2 /M
Context
262K
Tool Use
Supported
Access Nemotron 3 Nano 30b A3b through our unified API — OpenAI-compatible, no cold starts, transparent pricing.
Pricing on WaveSpeedAI: $0.05 per million input tokens and $0.20 per million output tokens. Prompt caching and batch processing are billed separately and reduce effective cost on long, repetitive workloads.
Nemotron 3 Nano 30b A3b supports up to 262K tokens of context with up to — tokens of output per request.
Yes. WaveSpeedAI exposes Nemotron 3 Nano 30b A3b through an OpenAI-compatible endpoint at https://llm.wavespeed.ai/v1. Point the official OpenAI SDK at this base URL with your WaveSpeedAI API key — no other code changes required.
Sign in to WaveSpeedAI, create an API key in Access Keys, then send a request to https://llm.wavespeed.ai/v1/chat/completions with model id set to the value shown above. New accounts receive free credits to evaluate Nemotron 3 Nano 30b A3b before paying per token.