Large Language Models

Free(): Learning to Forget in Malloc-Only Reasoning Models

YYilun ZhengDDongyang MaTTian LiangJJiahao XuXXinting HuangLLihui ChenHHaitao MiYYan Wang
Published
February 8, 2026
Authors
8
Word Count
11,826
Code
Includes code

Free()LM teaches reasoning models to forget irrelevant context, preventing accuracy collapse.

Abstract

Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attribute this to a fundamental architectural flaw: standard LLMs operate as "malloc-only" engines, continuously accumulating valid and redundant steps alike without a mechanism to prune obsolete information. To break this cycle, we propose Free()LM, a model that introduces an intrinsic self-forgetting capability via the Free-Module, a plug-and-play LoRA adapter. By iteratively switching between reasoning and cleaning modes, Free()LM dynamically identifies and prunes useless context chunks, maintaining a compact and noise-free state. Extensive experiments show that Free()LM provides consistent improvements across all model scales (8B to 685B). It achieves a 3.3% average improvement over top-tier reasoning baselines, even establishing a new SOTA on IMOanswerBench using DeepSeek V3.2-Speciale. Most notably, in long-horizon tasks where the standard Qwen3-235B-A22B model suffers a total collapse (0% accuracy), Free()LM restores performance to 50%. Our findings suggest that sustainable intelligence requires the freedom to forget as much as the power to think.

Key Takeaways

  • 1

    Language models accumulate reasoning tokens without pruning, causing accuracy collapse beyond certain token limits.

  • 2

    Free()LM adds a pruning mechanism via LoRA adapter that identifies and deletes redundant context chunks.

  • 3

    Training uses high-quality deletion examples filtered by reward mechanism to teach effective context pruning.

Limitations

  • Approach requires careful training data curation using external models to identify valid deletions.

  • Method's effectiveness may vary across different model sizes and reasoning task complexities.

Keywords

LLMsreasoning modelstest-time computethinking tokensmalloc-only enginesFree-ModuleLoRA adapterreasoning modecleaning modecontext pruningQwen3-235B-A22BDeepSeek V3.2-SpecialeIMOanswerBench

More in Large Language Models

View all
Free(): Learning to Forget in Malloc-Only Reasoning Models | Paperchime