ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas
- •Lianjia Tech debuts ASTRA, an automated framework for training tool-using AI agents via synthetic data.
- •System generates rule-verifiable environments and trajectories to enable precise, multi-turn reinforcement learning.
- •ASTRA models match closed-source performance on agentic benchmarks while preserving core reasoning capabilities.
Training Large Language Models (LLMs) to function as reliable agents—systems capable of using external tools to solve multi-step problems—has long been a bottleneck in AI development. Most current methods rely heavily on human-curated data or non-verifiable simulations, which often lack the complexity needed for real-world tasks.
To bridge this gap, researchers have developed ASTRA, a fully automated pipeline designed to synthesize "trajectories" (the sequences of actions an AI takes) and the "arenas" (the environments where those actions happen). By leveraging tool-call graphs, the system generates diverse training data that teaches the model how to navigate complex software tools.
What sets ASTRA apart is its ability to convert human reasoning traces into independent, code-executable environments. This allows for verifiable reinforcement learning, where the model receives clear, rule-based feedback on whether its multi-turn decisions were correct.
The result is a unified training methodology that balances task completion with interaction efficiency. By combining supervised fine-tuning with online reinforcement learning, ASTRA-trained models have demonstrated performance levels that rival top-tier closed-source systems on multiple industry benchmarks.