Query as Anchor: Scenario-Adaptive User Representation via Large Language Model

JJiahao YuanYYike XuJJinyong WenBBaokun WangZZiyi GaoXXiaotong LinYYun LiuXXing FuYYu ChengYYongchao LiuWWeiqiang WangZZhongle Xie

Published: February 16, 2026
Authors: 12
Word Count: 11,543

View on arXiv Download PDF

Dynamic user representations adapt to any scenario using query-anchored LLM embeddings.

Abstract

Industrial-scale user representation learning requires balancing robust universality with acute task-sensitivity. However, existing paradigms primarily yield static, task-agnostic embeddings that struggle to reconcile the divergent requirements of downstream scenarios within unified vector spaces. Furthermore, heterogeneous multi-source data introduces inherent noise and modality conflicts, degrading representation. We propose Query-as-Anchor, a framework shifting user modeling from static encoding to dynamic, query-aware synthesis. To empower Large Language Models (LLMs) with deep user understanding, we first construct UserU, an industrial-scale pre-training dataset that aligns multi-modal behavioral sequences with user understanding semantics, and our Q-Anchor Embedding architecture integrates hierarchical coarse-to-fine encoders into dual-tower LLMs via joint contrastive-autoregressive optimization for query-aware user representation. To bridge the gap between general pre-training and specialized business logic, we further introduce Cluster-based Soft Prompt Tuning to enforce discriminative latent structures, effectively aligning model attention with scenario-specific modalities. For deployment, anchoring queries at sequence termini enables KV-cache-accelerated inference with negligible incremental latency. Evaluations on 10 Alipay industrial benchmarks show consistent SOTA performance, strong scalability, and efficient deployment. Large-scale online A/B testing in Alipay's production system across two real-world scenarios further validates its practical effectiveness. Our code is prepared for public release and will be available at: https://github.com/JhCircle/Q-Anchor.

Key Takeaways

1
Query-as-Anchor generates dynamic, scenario-specific user representations from a single pre-trained model instead of static embeddings.
2
UserU dataset combines behavioral interaction data with LLM-generated synthetic query-answer pairs grounded in actual user behavior.
3
Hierarchical coarse-to-fine encoder processes six modalities of user behavior into multi-level embeddings for improved representation.

Limitations

Traditional static embeddings lack flexibility to adapt representations across different business scenarios and downstream tasks.
Dense language-centric LLM training data has significant semantic gap with sparse, symbolic, heterogeneous user behavior logs.

Keywords

UserUQ-Anchor Embeddingdual-tower LLMsjoint contrastive-autoregressive optimizationCluster-based Soft Prompt TuningKV-cache-accelerated inference

More in Large Language Models

View all

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Shaobo Wang, Xuan Ouyang +10

As high-quality public text approaches exhaustion, a phenomenon known as the Data Wall, pre-training is shifting from more tokens to better tokens. However, existing methods either rely on heuristic s...

Feb 5260

Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs

Zhongzhi Li, Xuansheng Wu +3

The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity usi...

Feb 11204

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Zixuan Huang, Xin Xia +12

Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often results in ...

Feb 9170

Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives

Tengyue Xu, Zhuoyang Qian +17

Autonomous scientific discovery with large language model (LLM)-based agents has recently made substantial progress, demonstrating the ability to automate end-to-end research workflows. However, exist...

Jan 28143

Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs

Zhiyuan Hu, Yucheng Wang +8

Reinforcement learning (RL) has become a central paradigm for post-training large language models (LLMs), particularly for complex reasoning tasks, yet it often suffers from exploration collapse: poli...

Jan 13129

More Large Language Models papers