Large Language Models

Panini: Continual Learning in Token Space via Structured Memory

SShreyas RajeshPPavan HolurMMehmet Yigit TuraliCChenda DuanVVwani Roychowdhury
Published
February 16, 2026
Authors
5
Word Count
15,318
Code
Includes code

PANINI transforms documents into semantic networks for efficient continual learning with 5-30x fewer tokens.

Abstract

Language models are increasingly used to reason over content they were not trained on, such as new documents, evolving knowledge, and user-specific data. A common approach is retrieval-augmented generation (RAG), which stores verbatim documents externally (as chunks) and retrieves only a relevant subset at inference time for an LLM to reason over. However, this results in inefficient usage of test-time compute (LLM repeatedly reasons over the same documents); moreover, chunk retrieval can inject irrelevant context that increases unsupported generation. We propose a human-like non-parametric continual learning framework, where the base model remains fixed, and learning occurs by integrating each new experience into an external semantic memory state that accumulates and consolidates itself continually. We present Panini, which realizes this by representing documents as Generative Semantic Workspaces (GSW) -- an entity- and event-aware network of question-answer (QA) pairs, sufficient for an LLM to reconstruct the experienced situations and mine latent knowledge via reasoning-grounded inference chains on the network. Given a query, Panini only traverses the continually-updated GSW (not the verbatim documents or chunks), and retrieves the most likely inference chains. Across six QA benchmarks, Panini achieves the highest average performance, 5%-7% higher than other competitive baselines, while using 2-30x fewer answer-context tokens, supports fully open-source pipelines, and reduces unsupported answers on curated unanswerable queries. The results show that efficient and accurate structuring of experiences at write time -- as achieved by the GSW framework -- yields both efficiency and reliability gains at read time. Code is available at https://github.com/roychowdhuryresearch/gsw-memory.

Key Takeaways

  • 1

    PANINI uses Generative Semantic Workspaces to structure documents as semantic networks at write time for efficient retrieval.

  • 2

    The approach reduces token usage by five to thirty times compared to traditional retrieval-augmented generation methods.

  • 3

    Dual indexing with BM25 and dense vectors enables multiple entry points into the structured semantic network.

Limitations

  • Traditional RAG approaches retrieve irrelevant context chunks, increasing hallucination risk and inference costs.

  • Existing structured approaches like RAPTOR and GraphRAG optimize for summarization rather than specific fact retrieval.

Keywords

retrieval-augmented generationcontinual learningsemantic memory stateGenerative Semantic Workspacesquestion-answer pairsreasoning-grounded inference chainsLLMRAGinference chains

More in Large Language Models

View all
Panini: Continual Learning in Token Space via Structured Memory | Paperchime