DistriVoting Boosts Accuracy in Large Reasoning Models
- •DistriVoting framework uses distributional priors to improve answer selection in large reasoning models.
- •Gaussian Mixture Models filter out unreliable responses by separating positive and negative confidence components.
- •SelfStepConf utilizes step-level confidence to dynamically adjust inference and increase prediction reliability.
Large reasoning models often generate multiple candidate responses to a single prompt to improve accuracy through test-time scaling. However, simply picking the most frequent or highest-confidence answer is often insufficient because internal model signals do not always align perfectly with correctness. To address this, researchers have introduced DistriVoting, a new method that leverages the statistical distribution of confidence scores to guide the final answer selection process more effectively.
The core of the DistriVoting approach involves treating the collection of generated responses as a mixture of two distinct statistical populations. By utilizing Gaussian Mixture Models, the system can mathematically decompose the total confidence scores into 'positive' (likely correct) and 'negative' (likely incorrect) components. A specialized reject filter then prunes responses that fall into the overlap between these two groups, significantly reducing the noise that typically plagues automated voting systems.
To further refine results, the researchers developed SelfStepConf, which shifts the focus from global output scores to step-by-step internal signals. By monitoring confidence at each individual reasoning step, the model can dynamically adjust its inference process to widen the statistical gap between correct and incorrect paths. This dual-pronged strategy was tested across 16 different models and five major benchmarks, demonstrating a significant performance leap over current state-of-the-art calibration techniques.