Large Language Models

Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories

SSidi LuZZhenwen LiangDDongyang MaYYan WangHHaitao MiDDong Yu
Published
February 4, 2026
Authors
6
Word Count
9,359

Locas enables efficient test-time adaptation by principally initializing memory modules using model activations and gradients.

Abstract

In this paper, we aim to bridge test-time-training with a new type of parametric memory that can be flexibly offloaded from or merged into model parameters. We present Locas, a Locally-Supported parametric memory that shares the design of FFN blocks in modern transformers, allowing it to be flexibly permanentized into the model parameters while supporting efficient continual learning. We discuss two major variants of Locas: one with a conventional two-layer MLP design that has a clearer theoretical guarantee; the other one shares the same GLU-FFN structure with SOTA LLMs, and can be easily attached to existing models for both parameter-efficient and computation-efficient continual learning. Crucially, we show that proper initialization of such low-rank sideway-FFN-style memories -- performed in a principled way by reusing model parameters, activations and/or gradients -- is essential for fast convergence, improved generalization, and catastrophic forgetting prevention. We validate the proposed memory mechanism on the PG-19 whole-book language modeling and LoCoMo long-context dialogue question answering tasks. With only 0.02\% additional parameters in the lowest case, Locas-GLU is capable of storing the information from past context while maintaining a much smaller context window. In addition, we also test the model's general capability loss after memorizing the whole book with Locas, through comparative MMLU evaluation. Results show the promising ability of Locas to permanentize past context into parametric knowledge with minimized catastrophic forgetting of the model's existing internal knowledge.

Key Takeaways

  • 1

    Locas uses model activations and gradients to initialize memory modules, reducing parameters by 85% compared to existing test-time training methods.

  • 2

    FFN layers can be interpreted as soft lookup tables with fixed memory slots, enabling efficient test-time adaptation without catastrophic forgetting.

  • 3

    Locas-GLU variant is compatible with modern LLMs like LLaMA and Mistral, requiring no architectural changes for deployment.

Limitations

  • Locas-MLP has compatibility issues with GLU-style FFN blocks used in state-of-the-art models, limiting its practical applicability.

  • The paper doesn't fully address scalability challenges for extremely long documents beyond whole-book language modeling tasks.

Keywords

parametric memorytransformerFFN blockscontinual learninglow-rank sideway-FFN-style memoriesGLU-FFNparameter-efficient learningcomputation-efficient learningcatastrophic forgettingMMLU evaluation

More in Large Language Models

View all
Locas: Your Models are Principled Initializers of Locally-Supported Parametric Memories | Paperchime