Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
- •MathForge framework improves mathematical reasoning by prioritizing harder questions during model training and data generation.
- •New DGPO algorithm corrects update imbalances in GRPO, ensuring models learn effectively from complex, high-difficulty problems.
- •Multi-Aspect Question Reformulation (MQR) systematically increases question difficulty without altering the original correct answers.
Current AI models often struggle with complex math because they spend too much time practicing the "easy" stuff. While reinforcement learning techniques have helped, researchers from AMAP-ML discovered that popular training algorithms like GRPO accidentally neglect difficult problems. This creates a ceiling where models get very good at mid-level tasks but fail to break through to advanced mathematical reasoning, as the learning signal often weakens when faced with high-difficulty scenarios. To solve this, the team introduced MathForge, a dual-strategy framework that focuses on the "frontier" of difficulty. The first component is Difficulty-Aware Group Policy Optimization (DGPO), which re-weights training updates so that the model pays more attention when it fails at a hard task. By balancing how much the model learns from different difficulty levels, it prevents the update magnitude from shrinking just because a problem is challenging, effectively forcing the model to tackle its weaknesses. The second half of the solution involves Multi-Aspect Question Reformulation (MQR). Instead of just rephrasing questions to look different, MQR actually makes them more intellectually demanding while keeping the same original gold answer. This ensures the model has a steady supply of "heavy weights" to lift during its training sessions. Extensive testing shows that this "harder is better" approach significantly boosts performance across various mathematical benchmarks, providing a new blueprint for scaling model intelligence through smarter data curation.