AI Agents

Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

HHaozhen ZhangHHaodong YueTTao FengQQuanyu LongJJianzhu BaoBBowen JinWWeizhi ZhangXXiao LiJJiaxuan YouCChengwei QinWWenya Wang
Published
February 5, 2026
Authors
11
Word Count
12,884
Code
Includes code

Query-aware budget routing optimizes AI agent memory extraction with tiered, modular processing strategies.

Abstract

Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present BudgetMem, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., Low/Mid/High). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.

Key Takeaways

  • 1

    BudgetMem enables query-aware budget control for runtime agent memory extraction through modular, tiered processing.

  • 2

    Three complementary tiering strategies—implementation, reasoning, and capacity—allow flexible quality-cost tradeoffs in memory systems.

  • 3

    Lightweight routers intelligently select processing tiers based on specific queries, optimizing performance-cost frontiers systematically.

Limitations

  • Most memory systems preprocess information offline, wasting computation on irrelevant details for specific queries.

  • Runtime memory extraction pushes expensive computation to inference time, creating critical cost and latency concerns.

Keywords

runtime memory utilizationquery-aware performance-cost controlmemory modulesbudget tierslightweight routerneural policyreinforcement learningLoCoMoLongMemEvalHotpotQA

More in AI Agents

View all
Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory | Paperchime