AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning
- •Fudan University’s AdaReasoner masters tool orchestration for visual reasoning, surpassing GPT-5 on specialized benchmarks.
- •Novel Tool-GRPO reinforcement learning optimizes tool selection and sequencing based on end-task success.
- •Adaptive learning allows the 7B model to generalize to unseen tools and improve performance by 24.9%.
Researchers at Fudan University have unveiled AdaReasoner, a novel family of multimodal models designed to master tool usage as a fundamental reasoning skill rather than a memorized behavior. While traditional models often struggle to decide which tools to invoke for complex visual tasks, AdaReasoner learns to coordinate multiple tools through a sophisticated data curation pipeline and adaptive learning. The breakthrough is powered by Tool-GRPO, a specialized reinforcement learning algorithm that optimizes how the model selects and sequences tools based on the ultimate success of the task. By focusing on end-results, the model naturally learns to ignore irrelevant tools and prioritize those that offer the highest utility for specific visual contexts. This allows the system to navigate long-horizon, multi-step interactions that typically trip up standard reasoning agents. Empirical results show that AdaReasoner surpasses proprietary giants like GPT-5 on challenging benchmarks such as Jigsaw and Visual Spatial Planning (VSP). Remarkably, the model demonstrates "tool-adaptive" behavior—autonomously adjusting its tool usage frequency and successfully applying tools it was never explicitly trained on. This signifies a major leap in creating systems that can flexibly extend their capabilities by interacting with external environments (Agentic AI).