Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory

HHaozhen ZhangHHaodong YueTTao FengQQuanyu LongJJianzhu BaoBBowen JinWWeizhi ZhangXXiao LiJJiaxuan YouCChengwei QinWWenya Wang

Published: February 5, 2026
Authors: 11
Word Count: 12,884
Code: Includes code

View on arXiv Download PDF

Query-aware budget routing optimizes AI agent memory extraction with tiered, modular processing strategies.

Abstract

Memory is increasingly central to Large Language Model (LLM) agents operating beyond a single context window, yet most existing systems rely on offline, query-agnostic memory construction that can be inefficient and may discard query-critical information. Although runtime memory utilization is a natural alternative, prior work often incurs substantial overhead and offers limited explicit control over the performance-cost trade-off. In this work, we present BudgetMem, a runtime agent memory framework for explicit, query-aware performance-cost control. BudgetMem structures memory processing as a set of memory modules, each offered in three budget tiers (i.e., Low/Mid/High). A lightweight router performs budget-tier routing across modules to balance task performance and memory construction cost, which is implemented as a compact neural policy trained with reinforcement learning. Using BudgetMem as a unified testbed, we study three complementary strategies for realizing budget tiers: implementation (method complexity), reasoning (inference behavior), and capacity (module model size). Across LoCoMo, LongMemEval, and HotpotQA, BudgetMem surpasses strong baselines when performance is prioritized (i.e., high-budget setting), and delivers better accuracy-cost frontiers under tighter budgets. Moreover, our analysis disentangles the strengths and weaknesses of different tiering strategies, clarifying when each axis delivers the most favorable trade-offs under varying budget regimes.

Key Takeaways

1
BudgetMem enables query-aware budget control for runtime agent memory extraction through modular, tiered processing.
2
Three complementary tiering strategies—implementation, reasoning, and capacity—allow flexible quality-cost tradeoffs in memory systems.
3
Lightweight routers intelligently select processing tiers based on specific queries, optimizing performance-cost frontiers systematically.

Limitations

Most memory systems preprocess information offline, wasting computation on irrelevant details for specific queries.
Runtime memory extraction pushes expensive computation to inference time, creating critical cost and latency concerns.

Keywords

runtime memory utilizationquery-aware performance-cost controlmemory modulesbudget tierslightweight routerneural policyreinforcement learningLoCoMoLongMemEvalHotpotQA

More in AI Agents

View all

LongCat-Flash-Thinking-2601 Technical Report

Meituan LongCat Team, Anchun Gui +160

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves ...

Jan 23149

Agentic Reasoning for Large Language Models

Tianxin Wei, Ting-Wei Li +27

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world se...

Jan 18149

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li +213

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: ...

Feb 11140

UI-Venus-1.5 Technical Report

Veuns-Team, Changlong Gao +25

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In ...

Feb 9140

daVinci-Dev: Agent-native Mid-training for Software Engineering

Ji Zeng, Dayuan Fu +15

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and ...

Jan 26113

More AI Agents papers