What are the key points?

NVIDIA Nemotron 3 Super debuts on Amazon Bedrock as a managed, serverless model for agentic applications. The 120B model features a hybrid Transformer-Mamba architecture with 5x higher throughput than its predecessor. New Latent MoE design allows for 4x more experts without increasing inference costs for complex reasoning.

NVIDIA Nemotron 3 Super Launches on Amazon Bedrock

•NVIDIA Nemotron 3 Super debuts on Amazon Bedrock as a managed, serverless model for agentic applications.
•The 120B model features a hybrid Transformer-Mamba architecture with 5x higher throughput than its predecessor.
•New Latent MoE design allows for 4x more experts without increasing inference costs for complex reasoning.

NVIDIA has significantly expanded its generative AI footprint on AWS with the launch of Nemotron 3 Super on Amazon Bedrock. This 120-billion parameter model is specifically engineered for "agentic" tasks—AI systems designed to independently plan and execute multi-step workflows. By offering this as a fully managed, serverless service, AWS allows developers to integrate high-performance reasoning into their applications without the headache of managing the underlying hardware.

What sets this model apart is its unique Hybrid Transformer-Mamba architecture. While standard models often struggle with efficiency as data sequences grow, this hybrid approach combines the strengths of traditional Transformers with Mamba, a newer architecture that handles long-range information more efficiently. Additionally, the model utilizes Latent Mixture of Experts (MoE). This technique allows the system to activate a subset of specialized "experts" for specific tasks, effectively providing the power of a much larger model while maintaining the speed and cost-effectiveness of a smaller one.

The model also introduces Multi-token prediction (MTP), which enables the AI to forecast several upcoming words or code snippets simultaneously rather than one by one. This breakthrough significantly boosts speed for complex outputs like software code or intricate financial analysis. With a massive 256,000-token context window—roughly the length of a thick novel—Nemotron 3 Super is positioned as a powerhouse for enterprise-grade automation across cybersecurity, retail, and distributed systems engineering.

NVIDIA has significantly expanded its generative AI footprint on AWS with the launch of Nemotron 3 Super on Amazon Bedrock. This 120-billion parameter model is specifically engineered for "agentic" tasks—AI systems designed to independently plan and execute multi-step workflows. By offering this as a fully managed, serverless service, AWS allows developers to integrate high-performance reasoning into their applications without the headache of managing the underlying hardware.

What sets this model apart is its unique Hybrid Transformer-Mamba architecture. While standard models often struggle with efficiency as data sequences grow, this hybrid approach combines the strengths of traditional Transformers with Mamba, a newer architecture that handles long-range information more efficiently. Additionally, the model utilizes Latent Mixture of Experts (MoE). This technique allows the system to activate a subset of specialized "experts" for specific tasks, effectively providing the power of a much larger model while maintaining the speed and cost-effectiveness of a smaller one.

The model also introduces Multi-token prediction (MTP), which enables the AI to forecast several upcoming words or code snippets simultaneously rather than one by one. This breakthrough significantly boosts speed for complex outputs like software code or intricate financial analysis. With a massive 256,000-token context window—roughly the length of a thick novel—Nemotron 3 Super is positioned as a powerhouse for enterprise-grade automation across cybersecurity, retail, and distributed systems engineering.

NVIDIA Nemotron 3 Super Launches on Amazon Bedrock

Tags