What are the key points?

LongCat-Flash-Prover 561B MoE model sets new record for open-weights formal mathematical reasoning. Achieves 97.1% on MiniF2F-Test using agentic tool-integrated reinforcement learning and Lean4. New HisPO algorithm stabilizes training for complex, long-horizon theorem-proving tasks.

LongCat MoE Model Dominates Formal Mathematical Reasoning

•LongCat-Flash-Prover 561B MoE model sets new record for open-weights formal mathematical reasoning.
•Achieves 97.1% on MiniF2F-Test using agentic tool-integrated reinforcement learning and Lean4.
•New HisPO algorithm stabilizes training for complex, long-horizon theorem-proving tasks.

Researchers have unveiled LongCat-Flash-Prover, a massive 561-billion-parameter model designed to master mathematical proofs using the Lean4 programming language. Unlike standard AI that simply predicts the next word, this model uses a specialized "agentic" approach, meaning it can actively use tools and interact with theorem-proving software to verify its logic step-by-step. This breakthrough demonstrates that scaling AI is not just about size, but about how models interact with structured rules.

The system works by breaking down the complex task of proving theorems into three distinct phases: translating human language into math code, creating a rough outline, and finally completing the rigorous proof. To train such a giant effectively, the team developed a Hierarchical Importance Sampling Policy Optimization algorithm. This technique ensures the model stays stable during training, preventing the common issue where AI "hacks" its rewards by finding shortcuts that don't actually solve the problem.

The results are striking, as the model achieved a 97.1% success rate on the MiniF2F-Test benchmark, a significant jump for open-weights technology. By solving 41.5% of the ultra-challenging PutnamBench problems, LongCat-Flash-Prover proves that AI is rapidly closing the gap with human mathematical reasoning. This release provides a powerful new toolkit for students and researchers looking to automate the most tedious parts of formal verification.

Researchers have unveiled LongCat-Flash-Prover, a massive 561-billion-parameter model designed to master mathematical proofs using the Lean4 programming language. Unlike standard AI that simply predicts the next word, this model uses a specialized "agentic" approach, meaning it can actively use tools and interact with theorem-proving software to verify its logic step-by-step. This breakthrough demonstrates that scaling AI is not just about size, but about how models interact with structured rules.

The system works by breaking down the complex task of proving theorems into three distinct phases: translating human language into math code, creating a rough outline, and finally completing the rigorous proof. To train such a giant effectively, the team developed a Hierarchical Importance Sampling Policy Optimization algorithm. This technique ensures the model stays stable during training, preventing the common issue where AI "hacks" its rewards by finding shortcuts that don't actually solve the problem.

The results are striking, as the model achieved a 97.1% success rate on the MiniF2F-Test benchmark, a significant jump for open-weights technology. By solving 41.5% of the ultra-challenging PutnamBench problems, LongCat-Flash-Prover proves that AI is rapidly closing the gap with human mathematical reasoning. This release provides a powerful new toolkit for students and researchers looking to automate the most tedious parts of formal verification.

LongCat MoE Model Dominates Formal Mathematical Reasoning

Tags