Salesforce Unveils SFR-RL Training Stack for AI Agents
- •Salesforce AI Research introduces SFR-RL, a training stack optimized for multi-turn agentic workflows.
- •New pipelined synchronous approach achieves 10x better memory efficiency than traditional training frameworks.
- •Framework enables training 120B-parameter MoE models at million-token context lengths using only 16 GPUs.
The transition from simple chat interactions to "agentic" AI—where models interact with tools, browse the web, and execute code—presents a massive challenge for training infrastructure. Current reinforcement learning systems often struggle with "stragglers," where GPUs sit idle while waiting for long, complex tasks to finish. Salesforce AI Research has addressed this bottleneck with SFR-RL, a new training stack designed specifically for this high-complexity environment.
Unlike existing methods that choose between slower synchronous training or unstable asynchronous updates, SFR-RL introduces a "pipelined synchronous" approach. This system alternates between a rollout phase, where the model generates actions, and a training phase, where it learns from them. By swapping the model between an inference engine and a training state across the entire GPU cluster, the system keeps hardware utilization near 100% while maintaining the stability needed for high-quality learning.
One of the most impressive feats is how the system handles Mixture-of-Experts (MoE) architectures—models that only activate specific "expert" parts of their network for each task to save computing power. SFR-RL uses Expert Parallelism (EP) to distribute these components efficiently, allowing a massive 120-billion-parameter model to process a million-token context window on just 16 H200 GPUs. This represents a significant leap in throughput and memory efficiency compared to earlier open-source frameworks.