Large Language Models

Query as Anchor: Scenario-Adaptive User Representation via Large Language Model

JJiahao YuanYYike XuJJinyong WenBBaokun WangZZiyi GaoXXiaotong LinYYun LiuXXing FuYYu ChengYYongchao LiuWWeiqiang WangZZhongle Xie
Published
February 16, 2026
Authors
12
Word Count
11,543

Dynamic user representations adapt to any scenario using query-anchored LLM embeddings.

Abstract

Industrial-scale user representation learning requires balancing robust universality with acute task-sensitivity. However, existing paradigms primarily yield static, task-agnostic embeddings that struggle to reconcile the divergent requirements of downstream scenarios within unified vector spaces. Furthermore, heterogeneous multi-source data introduces inherent noise and modality conflicts, degrading representation. We propose Query-as-Anchor, a framework shifting user modeling from static encoding to dynamic, query-aware synthesis. To empower Large Language Models (LLMs) with deep user understanding, we first construct UserU, an industrial-scale pre-training dataset that aligns multi-modal behavioral sequences with user understanding semantics, and our Q-Anchor Embedding architecture integrates hierarchical coarse-to-fine encoders into dual-tower LLMs via joint contrastive-autoregressive optimization for query-aware user representation. To bridge the gap between general pre-training and specialized business logic, we further introduce Cluster-based Soft Prompt Tuning to enforce discriminative latent structures, effectively aligning model attention with scenario-specific modalities. For deployment, anchoring queries at sequence termini enables KV-cache-accelerated inference with negligible incremental latency. Evaluations on 10 Alipay industrial benchmarks show consistent SOTA performance, strong scalability, and efficient deployment. Large-scale online A/B testing in Alipay's production system across two real-world scenarios further validates its practical effectiveness. Our code is prepared for public release and will be available at: https://github.com/JhCircle/Q-Anchor.

Key Takeaways

  • 1

    Query-as-Anchor generates dynamic, scenario-specific user representations from a single pre-trained model instead of static embeddings.

  • 2

    UserU dataset combines behavioral interaction data with LLM-generated synthetic query-answer pairs grounded in actual user behavior.

  • 3

    Hierarchical coarse-to-fine encoder processes six modalities of user behavior into multi-level embeddings for improved representation.

Limitations

  • Traditional static embeddings lack flexibility to adapt representations across different business scenarios and downstream tasks.

  • Dense language-centric LLM training data has significant semantic gap with sparse, symbolic, heterogeneous user behavior logs.

Keywords

UserUQ-Anchor Embeddingdual-tower LLMsjoint contrastive-autoregressive optimizationCluster-based Soft Prompt TuningKV-cache-accelerated inference

More in Large Language Models

View all
Query as Anchor: Scenario-Adaptive User Representation via Large Language Model | Paperchime