Generative AI

Context Forcing: Consistent Autoregressive Video Generation with Long Context

SShuo ChenCCong WeiSSun SunPPing NieKKai ZhouGGe ZhangMMing-Hsuan YangWWenhu Chen
Published
February 5, 2026
Authors
8
Word Count
6,732
Code
Includes code

Enhances video generation consistency over extended durations.

Abstract

Recent approaches to real-time long video generation typically employ streaming tuning strategies, attempting to train a long-context student using a short-context (memoryless) teacher. In these frameworks, the student performs long rollouts but receives supervision from a teacher limited to short 5-second windows. This structural discrepancy creates a critical student-teacher mismatch: the teacher's inability to access long-term history prevents it from guiding the student on global temporal dependencies, effectively capping the student's context length. To resolve this, we propose Context Forcing, a novel framework that trains a long-context student via a long-context teacher. By ensuring the teacher is aware of the full generation history, we eliminate the supervision mismatch, enabling the robust training of models capable of long-term consistency. To make this computationally feasible for extreme durations (e.g., 2 minutes), we introduce a context management system that transforms the linearly growing context into a Slow-Fast Memory architecture, significantly reducing visual redundancy. Extensive results demonstrate that our method enables effective context lengths exceeding 20 seconds -- 2 to 10 times longer than state-of-the-art methods like LongLive and Infinite-RoPE. By leveraging this extended context, Context Forcing preserves superior consistency across long durations, surpassing state-of-the-art baselines on various long video evaluation metrics.

Key Takeaways

  • 1

    Context Forcing improves long-term video generation consistency.

  • 2

    Uses a long-context teacher for effective training.

  • 3

    Introduces Slow-Fast Memory to handle long sequences efficiently.

Limitations

  • Requires significant computational resources for long sequences.

  • May still face challenges with extremely long videos.

Keywords

streaming tuning strategieslong-context studentshort-context teacherstudent-teacher mismatchcontext management systemSlow-Fast Memorylong-term consistencylong video generation

More in Generative AI

View all
Context Forcing: Consistent Autoregressive Video Generation with Long Context | Paperchime