New RL Method BandPO Solves LLM Entropy Collapse | KnowAI Space