Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Zhiyuan Hu, Yucheng Wang +8
Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs), particularly for complex reasoning tasks, yet it often suffers from exploration collapse: poli...