Large Language Models

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

ZZixuan HuangXXin XiaYYuxi RenJJianbin ZhengXXuanda WangZZhixia ZhangHHongyan XieSSongshi LiangZZehao ChenXXuefeng XiaoFFuzhen ZhuangJJianxin LiYYikun BanDDeqing Wang
Published
February 9, 2026
Authors
14
Word Count
16,648
Code
Includes code

Reasoning models know when to stop thinking but need better inference methods to use that knowledge.

Abstract

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in substantial redundancy, impairing computational efficiency and causing significant delays in real-time applications. Recent studies show that longer reasoning chains are frequently uncorrelated with correctness and can even be detrimental to accuracy. In a further in-depth analysis of this phenomenon, we surprisingly uncover and empirically verify that LRMs implicitly know the appropriate time to stop thinking, while this capability is obscured by current sampling paradigms. Motivated by this, we introduce SAGE (Self-Aware Guided Efficient Reasoning), a novel sampling paradigm that unleashes this efficient reasoning potential. Furthermore, integrating SAGE as mixed sampling into group-based reinforcement learning (SAGE-RL) enables SAGE-RL to effectively incorporate SAGE-discovered efficient reasoning patterns into standard pass@1 inference, markedly enhancing both the reasoning accuracy and efficiency of LRMs across multiple challenging mathematical benchmarks.

Key Takeaways

  • 1

    Reasoning models like DeepSeek-R1 know when to stop but current training methods prevent them from doing so efficiently.

  • 2

    Models often find correct answers early but continue generating redundant steps, wasting computational resources unnecessarily.

  • 3

    SAGE inference method helps models discover efficient reasoning paths by tracking cumulative log-probability across sequences.

Limitations

  • The script does not explain how SAGE performs compared to baseline methods quantitatively.

  • Practical implementation details and computational overhead of SAGE deployment are not discussed.

Keywords

large reasoning modelschains of thoughtsampling paradigmsself-aware guided efficient reasoninggroup-based reinforcement learningpass@1 inferencemathematical benchmarks

More in Large Language Models

View all
Does Your Reasoning Model Implicitly Know When to Stop Thinking? | Paperchime