What are the key points?

MiniMax-M2.5 matches GLM-4.7 performance with 229B parameter MoE architecture Agentic performance surges with GDPval-AA Elo rising from 1079 to 1215 Hallucination rates increase to 88% despite improvements in task accuracy

MiniMax-M2.5 Debuts with Enhanced Agentic Performance

•MiniMax-M2.5 matches GLM-4.7 performance with 229B parameter MoE architecture
•Agentic performance surges with GDPval-AA Elo rising from 1079 to 1215
•Hallucination rates increase to 88% despite improvements in task accuracy

MiniMax has launched MiniMax-M2.5, a strategic iteration of its model lineup that prioritizes functional utility over raw accuracy. While the model maintains its 229B parameter Mixture-of-Experts architecture—where only a fraction of the model's brain is active during any single task to save energy—it shows a significant leap in its ability to handle complex, multi-step workflows. This shift toward Agentic AI allows the model to better navigate realistic knowledge work, such as preparing presentations and conducting web-based research within a live terminal environment.

The update presents a trade-off for developers. On one hand, the model’s performance on the GDPval-AA benchmark ranks it among the top three open-weight models globally. It achieves this with remarkable token efficiency, requiring significantly fewer output tokens than its competitors to reach similar intelligence levels. This high efficiency makes it an attractive, cost-effective choice for those building autonomous coding or research tools.

However, these gains in autonomy come at the cost of reliability. Benchmarks reveal a regression in truthfulness, as Hallucination rates—instances where the AI confidently presents false information—have climbed to 88%. This suggests that while M2.5 is better at executing tasks, it is less likely to admit uncertainty. Users must balance this risk of misinformation against the model’s impressive 200k context window and improved instruction following.

MiniMax has launched MiniMax-M2.5, a strategic iteration of its model lineup that prioritizes functional utility over raw accuracy. While the model maintains its 229B parameter Mixture-of-Experts architecture—where only a fraction of the model's brain is active during any single task to save energy—it shows a significant leap in its ability to handle complex, multi-step workflows. This shift toward Agentic AI allows the model to better navigate realistic knowledge work, such as preparing presentations and conducting web-based research within a live terminal environment.

The update presents a trade-off for developers. On one hand, the model’s performance on the GDPval-AA benchmark ranks it among the top three open-weight models globally. It achieves this with remarkable token efficiency, requiring significantly fewer output tokens than its competitors to reach similar intelligence levels. This high efficiency makes it an attractive, cost-effective choice for those building autonomous coding or research tools.

However, these gains in autonomy come at the cost of reliability. Benchmarks reveal a regression in truthfulness, as Hallucination rates—instances where the AI confidently presents false information—have climbed to 88%. This suggests that while M2.5 is better at executing tasks, it is less likely to admit uncertainty. Users must balance this risk of misinformation against the model’s impressive 200k context window and improved instruction following.

MiniMax-M2.5 Debuts with Enhanced Agentic Performance

Tags