WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning
- •WideSeek-R1 achieves high performance through width scaling using parallel multi-agent orchestration
- •4B parameter model matches 671B DeepSeek-R1 performance on broad information-seeking benchmarks
- •Multi-Agent Reinforcement Learning optimizes coordination between lead agents and specialized subagents
The current frontier of AI research is heavily focused on "depth scaling"—the pursuit of making single, massive models increasingly intelligent to handle complex, multi-step reasoning. However, when tasks require searching through vast, broad pools of information, these singular giants often struggle with organizational efficiency. WideSeek-R1 introduces a compelling alternative called "width scaling," which distributes the workload across a collaborative team of smaller AI agents rather than relying on a single, monolithic brain.
At the heart of this system is a lead-agent-subagent architecture trained through a specialized process called multi-agent reinforcement learning. This method allows a central "lead" agent to act as a manager, delegating specific parts of a query to multiple "subagents" that work in parallel. By isolating contexts and utilizing specialized tools, these agents can process information simultaneously. Remarkably, a WideSeek-R1 configuration using only 4 billion parameters achieved an F1 score of 40% on the WideSearch benchmark, rivaling the performance of the gargantuan 671 billion parameter DeepSeek-R1.
Perhaps most exciting is the system's inherent scalability; performance improves consistently as more subagents are added to the mix, all without requiring additional training. This shift from individual model size to collective "organizational capability" suggests a future where swarms of smaller, efficient models could outperform today’s most expensive AI titans. It marks a significant step toward more agile and resource-efficient information-seeking systems.