Amazon Launches Managed Inference for Custom Nova Models
- •Amazon SageMaker adds managed inference for custom Nova Micro, Lite, and 2 Lite models.
- •New features offer granular control over GPU utilization, auto-scaling, and configurable context lengths for production.
- •Integration enables seamless deployment of models trained via Amazon HyperPod or standard SageMaker Training Jobs.
Amazon Web Services has expanded its SageMaker AI ecosystem by introducing managed inference support for custom Amazon Nova models. This update bridges the gap between training and production, allowing organizations to scale specialized versions of Nova Micro and Nova Lite with professional-grade reliability. Previously, developers faced hurdles when moving from experimentation to high-traffic use; now, they can leverage automated scaling and optimized hardware to ensure their AI applications remain responsive and cost-effective.
The integration offers granular control over deployment parameters critical for business operations. Users can now adjust concurrency settings—how many requests a model handles at once—and modify context lengths to suit specific document processing needs. By supporting models that have undergone supervised fine-tuning (refining a model on specific tasks) or reinforcement learning, AWS ensures that specialized industry intelligence can be served efficiently across global regions.
Efficiency is a core focus of this release. With auto-scaling policies that react to usage patterns and support for diverse NVIDIA GPU instances, companies can minimize idle resources without sacrificing speed. This end-to-end workflow simplifies the complex lifecycle of advanced reasoning models in a modern enterprise setting.