Generative AI

BitDance: Scaling Autoregressive Generative Models with Binary Tokens

YYuang AiJJiaming HanSShaobin ZhuangWWeijia MaoXXuefeng HuZZiyan YangZZhenheng YangHHuaibo HuangXXiangyu YueHHao Chen
Published
February 15, 2026
Authors
10
Word Count
11,966
Code
Includes code

BitDance enables fast, high-quality autoregressive image generation using binary tokens at massive scale.

Abstract

We present BitDance, a scalable autoregressive (AR) image generator that predicts binary visual tokens instead of codebook indices. With high-entropy binary latents, BitDance lets each token represent up to 2^{256} states, yielding a compact yet highly expressive discrete representation. Sampling from such a huge token space is difficult with standard classification. To resolve this, BitDance uses a binary diffusion head: instead of predicting an index with softmax, it employs continuous-space diffusion to generate the binary tokens. Furthermore, we propose next-patch diffusion, a new decoding method that predicts multiple tokens in parallel with high accuracy, greatly speeding up inference. On ImageNet 256x256, BitDance achieves an FID of 1.24, the best among AR models. With next-patch diffusion, BitDance beats state-of-the-art parallel AR models that use 1.4B parameters, while using 5.4x fewer parameters (260M) and achieving 8.7x speedup. For text-to-image generation, BitDance trains on large-scale multimodal tokens and generates high-resolution, photorealistic images efficiently, showing strong performance and favorable scaling. When generating 1024x1024 images, BitDance achieves a speedup of over 30x compared to prior AR models. We release code and models to facilitate further research on AR foundation models. Code and models are available at: https://github.com/shallowdream204/BitDance.

Key Takeaways

  • 1

    BitDance scales binary tokenization to 2^256 vocabulary size, enabling faster autoregressive image generation.

  • 2

    Binary quantization avoids codebook collapse while maintaining discrete token benefits for long sequences.

  • 3

    A novel binary diffusion head replaces standard classification to handle the astronomically large vocabulary space.

Limitations

  • Previous binary quantization approaches were limited to vocabularies around 2^18 or 2^32.

  • Standard sampling methods fail with such enormous vocabularies due to parameter explosion and bit correlation.

Keywords

autoregressive image generatorbinary visual tokenshigh-entropy binary latentsbinary diffusion headnext-patch diffusiondiffusion modelsFIDparameter-efficienttext-to-image generationphotorealistic imagesimage generation speedup

More in Generative AI

View all
BitDance: Scaling Autoregressive Generative Models with Binary Tokens | Paperchime