Does Your Reasoning Model Implicitly Know When to Stop Thinking?

ZZixuan HuangXXin XiaYYuxi RenJJianbin ZhengXXuanda WangZZhixia ZhangHHongyan XieSSongshi LiangZZehao ChenXXuefeng XiaoFFuzhen ZhuangJJianxin LiYYikun BanDDeqing Wang

Published: February 9, 2026
Authors: 14
Word Count: 16,648
Code: Includes code

View on arXiv Download PDF

Reasoning models know when to stop thinking but need better inference methods to use that knowledge.

Abstract

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SAGE-RL) enables SAGE-RL to effectively incorporate SAGE-discovered efficient reasoning patterns into standard pass@1 inference, markedly enhancing both the reasoning accuracy and efficiency of LRMs across multiple challenging mathematical benchmarks.

Key Takeaways

1
Reasoning models like DeepSeek-R1 know when to stop but current training methods prevent them from doing so efficiently.
2
Models often find correct answers early but continue generating redundant steps, wasting computational resources unnecessarily.
3
SAGE inference method helps models discover efficient reasoning paths by tracking cumulative log-probability across sequences.

Limitations

The script does not explain how SAGE performs compared to baseline methods quantitatively.
Practical implementation details and computational overhead of SAGE deployment are not discussed.

Keywords

large reasoning modelschains of thoughtsampling paradigmsself-aware guided efficient reasoninggroup-based reinforcement learningpass@1 inferencemathematical benchmarks

More in Large Language Models

View all

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Shaobo Wang, Xuan Ouyang +10

As high-quality public text approaches exhaustion, a phenomenon known as the Data Wall, pre-training is shifting from more tokens to better tokens. However, existing methods either rely on heuristic s...

Feb 5260

Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Zhongzhi Li, Xuansheng Wu +3

The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity usi...

Feb 11204

Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives

Tengyue Xu, Zhuoyang Qian +17

Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, exist...

Jan 28143

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Zhiyuan Hu, Yucheng Wang +8

Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs), particularly for complex reasoning tasks, yet it often suffers from exploration collapse: poli...

Jan 13129

T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning

Qinsi Wang, Hancheng Ye +13

Think about how human handles complex reading tasks: marking key points, inferring their relationships, and structuring information to guide understanding and responses. Likewise, can a large language...

Mar 4109

More Large Language Models papers