New Algorithm Stabilizes Reinforcement Learning for LLM Training | KnowAI Space