Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text
- •GEM pipeline extracts multi-turn tool-use data from text corpora to train autonomous AI agents
- •GEM-32B achieves 16.5% performance boost on BFCL V3 benchmark, outperforming models trained on in-domain data
- •Specialized Trajectory Synthesizer reduces inference costs while maintaining high-quality data generation through supervised fine-tuning
Training AI agents to navigate complex, multi-step tasks often hits a wall because high-quality interaction data is remarkably scarce. While existing methods rely on rigid, predefined toolsets, researchers have introduced GEM, a framework that harvests 'implicit experiences' from vast text corpora. By treating standard text as a roadmap for problem-solving, the system identifies relevant workflows and grounds them into executable tool trajectories through a four-stage refinement process. This shift from synthetic API calls to text-based extraction allows for much greater diversity in training scenarios. The results are striking, with the GEM-32B model delivering a 16.5% improvement on the BFCL V3 Multi-turn benchmark. Perhaps most impressively, the team distilled this entire complex pipeline into a specialized Trajectory Synthesizer. This dedicated model uses Fine-tuning to replicate the pipeline's output at a fraction of the computational cost. This approach suggests that the next generation of AI Agent systems might not need more data, but rather a more clever way to translate human knowledge into actionable skills. This method ensures higher efficiency without sacrificing the quality of the synthesized experiences, marking a significant step forward in scalable agent training. By bridging the gap between static text and active tool execution, the research provides a blueprint for more versatile foundation models.