Latest Reinforcement Learning Research Papers

Research on learning through interaction, reward optimization, policy learning, and decision-making AI systems.

9 Papers
Showing 9 of 9 papers

KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning

Egor Cherepanov, Daniil Zelezetsky, Alexey K. Kovalev +1 more

Pixel-based reinforcement learning agents often fail under purely visual distribution shift even when latent dynamics and rewards are unchanged, but existing benchmarks entangle multiple sources of shift and hinder systematic analysis. We introduce KAGE-Env, a JAX-native 2D platformer that factorize...

pixel-based reinforcement learningvisual distribution shiftlatent dynamicsreward functionJAX-native+5 more
Jan 20, 20268

Behavior Knowledge Merge in Reinforced Agentic Models

Xiangchi Yuan, Dachuan Shi, Chunhui Zhang +4 more

Reinforcement learning (RL) is central to post-training, particularly for agentic models that require specialized reasoning behaviors. In this setting, model merging offers a practical mechanism for integrating multiple RL-trained agents from different tasks into a single generalist model. However, ...

reinforcement learningmodel mergingagentic modelstask vectorssupervised fine-tuning+5 more
Jan 20, 202622

Your Group-Relative Advantage Is Biased

Fengkai Yang, Zherui Chen, Xiaohan Wang +10 more

Reinforcement Learning from Verifier Rewards (RLVR) has emerged as a widely used approach for post-training large language models on reasoning tasks, with group-based methods such as GRPO and its variants gaining broad adoption. These methods rely on group-relative advantage estimation to avoid lear...

Reinforcement Learning from Verifier Rewardsgroup-based methodsGRPOadvantage estimationbias correction+4 more
Jan 13, 2026128

LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Qingyu Ren, Qianyu He, Jingwen Chang +6 more

Instruction-following is critical for large language models, but real-world instructions often contain logical structures such as sequential dependencies and conditional branching. Existing methods typically construct datasets with parallel constraints and optimize average rewards, ignoring logical ...

instruction-followinglogical structuressequential dependenciesconditional branchingLSRInstruct+7 more
Jan 10, 202610
Latest Reinforcement Learning Research | Reinforcement Learning Papers