Large Language Models

dVoting: Fast Voting for dLLMs

SSicheng FengZZigeng ChenXXinyin MaGGongfan FangXXinchao Wang
Published
February 12, 2026
Authors
5
Word Count
8,088
Code
Includes code

dVoting accelerates reasoning in diffusion models by voting only on uncertain tokens.

Abstract

Diffusion Large Language Models (dLLMs) represent a new paradigm beyond autoregressive modeling, offering competitive performance while naturally enabling a flexible decoding process. Specifically, dLLMs can generate tokens at arbitrary positions in parallel, endowing them with significant potential for parallel test-time scaling, which was previously constrained by severe inefficiency in autoregressive modeling. In this work, we introduce dVoting, a fast voting technique that boosts reasoning capability without training, with only an acceptable extra computational overhead. dVoting is motivated by the observation that, across multiple samples for the same prompt, token predictions remain largely consistent, whereas performance is determined by a small subset of tokens exhibiting cross-sample variability. Leveraging the arbitrary-position generation capability of dLLMs, dVoting performs iterative refinement by sampling, identifying uncertain tokens via consistency analysis, regenerating them through voting, and repeating this process until convergence. Extensive evaluations demonstrate that dVoting consistently improves performance across various benchmarks. It achieves gains of 6.22%-7.66% on GSM8K, 4.40%-7.20% on MATH500, 3.16%-14.84% on ARC-C, and 4.83%-5.74% on MMLU. Our code is available at https://github.com/fscdc/dVoting

Key Takeaways

  • 1

    dVoting improves reasoning in diffusion language models by selectively regenerating uncertain tokens while locking confident ones.

  • 2

    Remask sampling leverages the observation that 60% of tokens remain consistent across samples, concentrating effort on harder problem areas.

  • 3

    The entropy-threshold approach with early stopping mechanisms makes test-time scaling computationally efficient for diffusion language models.

Limitations

  • The approach requires multiple sampling rounds, which may still incur computational overhead despite efficiency improvements.

  • Testing limited to two diffusion models; generalization to other architectures and domains remains unclear.

Keywords

diffusion large language modelsautoregressive modelingparallel test-time scalingtoken predictionsiterative refinementconsistency analysisvoting technique

More in Large Language Models

View all
dVoting: Fast Voting for dLLMs | Paperchime