RLinf-USER: A Unified and Extensible System for Real-World Online Policy Learning in Embodied AI

HHongzhi ZangSShu'ang YuHHao LinTTianxing ZhouZZefang HuangZZhen GuoXXin XuJJiakai ZhouYYuze ShengSShizhe ZhangFFeng GaoWWenhao TangYYufeng YueQQuanlu ZhangXXinlei ChenCChao YuYYu Wang

Published: February 8, 2026
Authors: 17
Word Count: 9,265

View on arXiv Download PDF

USER enables efficient, real-world online policy learning.

Abstract

Online policy learning directly in the physical world is a promising yet challenging direction for embodied intelligence. Unlike simulation, real-world systems cannot be arbitrarily accelerated, cheaply reset, or massively replicated, which makes scalable data collection, heterogeneous deployment, and long-horizon effective training difficult. These challenges suggest that real-world policy learning is not only an algorithmic issue but fundamentally a systems problem. We present USER, a Unified and extensible SystEm for Real-world online policy learning. USER treats physical robots as first-class hardware resources alongside GPUs through a unified hardware abstraction layer, enabling automatic discovery, management, and scheduling of heterogeneous robots. To address cloud-edge communication, USER introduces an adaptive communication plane with tunneling-based networking, distributed data channels for traffic localization, and streaming-multiprocessor-aware weight synchronization to regulate GPU-side overhead. On top of this infrastructure, USER organizes learning as a fully asynchronous framework with a persistent, cache-aware buffer, enabling efficient long-horizon experiments with robust crash recovery and reuse of historical data. In addition, USER provides extensible abstractions for rewards, algorithms, and policies, supporting online imitation or reinforcement learning of CNN/MLP, generative policies, and large vision-language-action (VLA) models within a unified pipeline. Results in both simulation and the real world show that USER enables multi-robot coordination, heterogeneous manipulators, edge-cloud collaboration with large models, and long-running asynchronous training, offering a unified and extensible systems foundation for real-world online policy learning.

Key Takeaways

1
USER bridges the sim-to-real gap in embodied AI.
2
Unified hardware abstraction enables efficient multi-robot training.
3
Asynchronous framework improves data collection and training throughput.

Limitations

Real-world deployments face network failures and interruptions.
Heterogeneous platforms and unstable networks pose challenges.

Keywords

online policy learningembodied intelligencereal-world systemsheterogeneous robotshardware abstraction layeradaptive communication planetunneling-based networkingdistributed data channelsstreaming-multiprocessor-aware weight synchronizationasynchronous frameworkcache-aware buffercrash recoveryreinforcement learningimitation learningvision-language-action modelsmulti-robot coordinationedge-cloud collaboration

More in Robotics & Embodied AI

View all

RynnBrain: Open Embodied Foundation Models

Ronghao Dang, Jiayan Guo +24

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning ...

Feb 1336

RoboPocket: Improve Robot Policies Instantly with Your Phone

Junjie Fang, Wendi Chen +8

Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predo...

Mar 530

SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation

Mu Huang, Hui Wang +6

Simulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by environmental effects and robot actions. Exist...

Feb 228

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Runpei Dong, Ziyan Li +2

Visual loco-manipulation of arbitrary objects in the wild with humanoid robots requires accurate end-effector (EE) control and a generalizable understanding of the scene via visual inputs (e.g., RGB-D...

Feb 1826

RISE: Self-Improving Robot Policy with Compositional World Model

Jiazhi Yang, Kunyang Lin +11

Despite the sustained scaling on model capacity and data acquisition, Vision-Language-Action (VLA) models remain brittle in contact-rich and dynamic manipulation tasks, where minor execution deviation...

Feb 1126

More Robotics & Embodied AI papers