TTCS: Test-Time Curriculum Synthesis for Self-Evolving
- •TTCS framework improves LLM reasoning by synthesizing custom question curricula during the inference stage.
- •Co-evolving system uses a synthesizer to create question variants and a solver to generate rewards.
- •Method enhances performance on mathematical benchmarks and transfers effectively across diverse model architectures.
Test-time training represents a shift in how we approach model intelligence, allowing LLMs to adapt to specific problems during the actual inference phase. While traditional methods often struggle with high-difficulty questions that provide poor learning signals, the TTCS framework introduces a co-evolving relationship between two internal policies. A specialized question synthesizer creates a structured sequence of increasingly difficult variants—essentially a personalized study guide—tailored to the model's current skill level.
The second component, the reasoning solver, tackles these synthetic challenges while generating self-consistency rewards. These rewards act as a quality check, where the model compares multiple attempts at a problem to determine the most likely correct path. This feedback loop is bidirectional: the solver’s performance informs the synthesizer on which questions to generate next, while the generated curriculum prevents the training process from becoming unstable or crashing when faced with limited data.
The implications for the future of self-evolution in AI are significant. By dynamically constructing its own learning path, the model can bridge the gap between its pre-trained knowledge and novel, complex reasoning tasks. Experiments demonstrate that this approach not only excels in mathematical domains but also generalizes across different architectures, suggesting a scalable path for models to improve their own logic without constant human intervention.