Robotics & Embodied AI

RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI

HHongzhi ZangSShu'ang YuHHao LinTTianxing ZhouZZefang HuangZZhen GuoXXin XuJJiakai ZhouYYuze ShengSShizhe ZhangFFeng GaoWWenhao TangYYufeng YueQQuanlu ZhangXXinlei ChenCChao YuYYu Wang
Published
February 8, 2026
Authors
17
Word Count
9,265

USER enables efficient, real-world online policy learning.

Abstract

Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, and long-horizon effective training difficult. These challenges suggest that real-world policy learning is not only an algorithmic issue but fundamentally a systems problem. We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots. To address cloud-edge communication, USER introduces an adaptive communication plane with tunneling-based networking, distributed data channels for traffic localization, and streaming-multiprocessor-aware weight synchronization to regulate GPU-side overhead. On top of this infrastructure, USER organizes learning as a fully asynchronous framework with a persistent, cache-aware buffer, enabling efficient long-horizon experiments with robust crash recovery and reuse of historical data. In addition, USER provides extensible abstractions for rewards, algorithms, and policies, supporting online imitation or reinforcement learning of CNN/MLP, generative policies, and large vision-language-action (VLA) models within a unified pipeline. Results in both simulation and the real world show that USER enables multi-robot coordination, heterogeneous manipulators, edge-cloud collaboration with large models, and long-running asynchronous training, offering a unified and extensible systems foundation for real-world online policy learning.

Key Takeaways

  • 1

    USER bridges the sim-to-real gap in embodied AI.

  • 2

    Unified hardware abstraction enables efficient multi-robot training.

  • 3

    Asynchronous framework improves data collection and training throughput.

Limitations

  • Real-world deployments face network failures and interruptions.

  • Heterogeneous platforms and unstable networks pose challenges.

Keywords

online policy learningembodied intelligencereal-world systemsheterogeneous robotshardware abstraction layeradaptive communication planetunneling-based networkingdistributed data channelsstreaming-multiprocessor-aware weight synchronizationasynchronous frameworkcache-aware buffercrash recoveryreinforcement learningimitation learningvision-language-action modelsmulti-robot coordinationedge-cloud collaboration

More in Robotics & Embodied AI

View all
RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI | Paperchime