Large Language Models

The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context

XXiaoyuan LiuTTian LiangDDongyang MaDDeyu ZhouHHaitao MiPPinjia HeYYan Wang
Published
February 12, 2026
Authors
7
Word Count
7,661
Code
Includes code

StateLM empowers language models to actively manage their own context through intelligent memory tools.

Abstract

In the world of Harry Potter, when Dumbledore's mind is overburdened, he extracts memories into a Pensieve to be revisited later. In the world of AI, while we possess the Pensieve-mature databases and retrieval systems, our models inexplicably lack the "wand" to operate it. They remain like a Dumbledore without agency, passively accepting a manually engineered context as their entire memory. This work finally places the wand in the model's hand. We introduce StateLM, a new class of foundation models endowed with an internal reasoning loop to manage their own state. We equip our model with a suite of memory tools, such as context pruning, document indexing, and note-taking, and train it to actively manage these tools. By learning to dynamically engineering its own context, our model breaks free from the architectural prison of a fixed window. Experiments across various model sizes demonstrate StateLM's effectiveness across diverse scenarios. On long-document QA tasks, StateLMs consistently outperform standard LLMs across all model scales; on the chat memory task, they achieve absolute accuracy improvements of 10% to 20% over standard LLMs. On the deep research task BrowseComp-Plus, the performance gap becomes even more pronounced: StateLM achieves up to 52% accuracy, whereas standard LLM counterparts struggle around 5%. Ultimately, our approach shifts LLMs from passive predictors to state-aware agents where reasoning becomes a stateful and manageable process.

Key Takeaways

  • 1

    StateLM gives language models active agency to manage their own context through a toolkit of memory and pruning operations.

  • 2

    The Pensieve paradigm enables models to extract key facts into external memory and delete raw text, creating efficient sawtooth context patterns.

  • 3

    Current LLMs are stateless and passive, forcing humans to orchestrate context engineering instead of letting models decide what to remember.

Limitations

  • The script cuts off mid-sentence before explaining the full two-stage training approach and its effectiveness metrics.

  • No empirical results or performance comparisons against traditional retrieval systems are provided in the excerpt.

Keywords

foundation modelsinternal reasoning loopmemory toolscontext pruningdocument indexingnote-takingdynamic context engineeringlong-document QAchat memory taskdeep research taskBrowseComp-Plus

More in Large Language Models

View all
The Pensieve Paradigm: Stateful Language Models Mastering Their Own Context | Paperchime