Efficient AI

Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing

TTong ZhengCChengsong HuangRRunpeng DaiYYun HeRRui LiuXXin NiHHuiwen BaoKKaishen WangHHongtu ZhuJJiaxin HuangFFurong HuangHHeng Huang
Published
February 3, 2026
Authors
12

Abstract

Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory signals and lack principled mechanisms to exploit global dynamics across parallel branches. We introduce 2D probing, an interface that exposes the width-depth dynamics of parallel thinking by periodically eliciting intermediate answers from all branches. Our analysis reveals three key insights: non-monotonic scaling across width-depth allocations, heterogeneous reasoning branch lengths, and early stabilization of global consensus. Guided by these insights, we introduce Parallel-Probe, a training-free controller designed to optimize online parallel thinking. Parallel-Probe employs consensus-based early stopping to regulate reasoning depth and deviation-based branch pruning to dynamically adjust width. Extensive experiments across three benchmarks and multiple models demonstrate that Parallel-Probe establishes a superior Pareto frontier for test-time scaling. Compared to standard majority voting, it reduces sequential tokens by up to 35.8% and total token cost by over 25.8% while maintaining competitive accuracy.

Keywords

parallel thinkingwidth-depth dynamicsintermediate answersconsensus-based early stoppingdeviation-based branch pruningPareto frontiertest-time scalingmajority voting

More in Efficient AI

View all
Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing | Paperchime