NVIDIA: Llama 3.3 Nemotron Super 49B V1.5 coding

Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price131k

0.40- -0.40

NVIDIA: Nemotron 3 Nano 30B A3B chat

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price262k

0.05- -0.20

NVIDIA: Nemotron 3 Nano 30B A3B (free) chat

NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems. The model is fully...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price256k

-- --

NVIDIA: Nemotron 3 Nano Omni (free) reasoning

NVIDIA Nemotron™ 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price256k

-- --

NVIDIA: Nemotron 3 Super chat

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price1M

0.09- -0.45

NVIDIA: Nemotron 3 Super (free) chat

NVIDIA Nemotron 3 Super is a 120B-parameter open hybrid MoE model, activating just 12B parameters for maximum compute efficiency and accuracy in complex multi-agent applications. Built on a hybrid Mamba-Transformer...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price1M

-- --

NVIDIA: Nemotron 3 Ultra reasoning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price1M

0.500.10 -2.20

NVIDIA: Nemotron 3 Ultra (free) reasoning

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price1M

-- --

NVIDIA: Nemotron 3.5 Content Safety (free) chat

NVIDIA Nemotron 3.5 Content Safety is a compact 4B-parameter multimodal guardrail model from NVIDIA, fine-tuned from Google Gemma-3-4B. It moderates both inputs to and responses from LLMs and VLMs, accepting...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price128k

-- --

NVIDIA: Nemotron Nano 12B 2 VL (free) reasoning

NVIDIA Nemotron Nano 2 VL is a 12-billion-parameter open multimodal reasoning model designed for video understanding and document intelligence. It introduces a hybrid Transformer-Mamba architecture, combining transformer-level accuracy with Mamba’s...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price128k

-- --

NVIDIA: Nemotron Nano 9B V2 (free) reasoning

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and...

Context Inputs Outputs Input Price Cache Read Price Cache Write Price Output Price128k

-- --