Efficient AI

Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

HHaocheng XiSShuo YangYYilong ZhaoMMuyang LiHHan CaiXXingyang LiYYujun LinZZhuoyang ZhangJJintao ZhangXXiuyu LiZZhiying XuJJun WuCChenfeng XuIIon StoicaSSong HanKKurt Keutzer
Published
February 3, 2026
Authors
16
Word Count
6,133

Efficient long video generation via 2-bit KV-cache quantization.

Abstract

Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive video generation models, the KV cache grows with generation history and quickly dominates GPU memory, often exceeding 30 GB, preventing deployment on widely available hardware. More critically, constrained KV cache budgets restrict the effective working memory, directly degrading long horizon consistency in identity, layout, and motion. To address this challenge, we present Quant VideoGen (QVG), a training free KV cache quantization framework for autoregressive video diffusion models. QVG leverages video spatiotemporal redundancy through Semantic Aware Smoothing, producing low magnitude, quantization friendly residuals. It further introduces Progressive Residual Quantization, a coarse to fine multi stage scheme that reduces quantization error while enabling a smooth quality memory trade off. Across LongCat Video, HY WorldPlay, and Self Forcing benchmarks, QVG establishes a new Pareto frontier between quality and memory efficiency, reducing KV cache memory by up to 7.0 times with less than 4% end to end latency overhead while consistently outperforming existing baselines in generation quality.

Key Takeaways

  • 1

    Reduces KV-cache memory footprint for long video generation.

  • 2

    Leverages video-specific spatiotemporal redundancy for quantization.

  • 3

    Maintains high video quality through progressive quantization.

Limitations

  • Requires video-specific data for effective quantization.

  • May not generalize well to non-video generative models.

Keywords

KV cacheautoregressive video diffusionvideo spatiotemporal redundancySemantic Aware SmoothingProgressive Residual Quantizationquantization errorPareto frontiermemory efficiency

More in Efficient AI

View all
Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization | Paperchime