NVIDIA Researchers Fix RL Optimization via GDPO Algorithm | KnowAI Space