Reinforcement Learning

ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking

WWeidong HuangJJingwen ZhangJJiongye LiSShibowen ZhangJJiayang WuJJiayi WangHHangxin LiuYYaodong YangYYao Su
Published
February 6, 2026
Authors
9
Word Count
11,772

Optimizing energy-efficient humanoid walking with constrained RL.

Abstract

Achieving stable and energy-efficient locomotion is essential for humanoid robots to operate continuously in real-world applications. Existing MPC and RL approaches often rely on energy-related metrics embedded within a multi-objective optimization framework, which require extensive hyperparameter tuning and often result in suboptimal policies. To address these challenges, we propose ECO (Energy-Constrained Optimization), a constrained RL framework that separates energy-related metrics from rewards, reformulating them as explicit inequality constraints. This method provides a clear and interpretable physical representation of energy costs, enabling more efficient and intuitive hyperparameter tuning for improved energy efficiency. ECO introduces dedicated constraints for energy consumption and reference motion, enforced by the Lagrangian method, to achieve stable, symmetric, and energy-efficient walking for humanoid robots. We evaluated ECO against MPC, standard RL with reward shaping, and four state-of-the-art constrained RL methods. Experiments, including sim-to-sim and sim-to-real transfers on the kid-sized humanoid robot BRUCE, demonstrate that ECO significantly reduces energy consumption compared to baselines while maintaining robust walking performance. These results highlight a substantial advancement in energy-efficient humanoid locomotion. All experimental demonstrations can be found on the project website: https://sites.google.com/view/eco-humanoid.

Key Takeaways

  • 1

    ECO framework enhances energy efficiency in humanoid walking.

  • 2

    Separates energy constraints from rewards for better optimization.

  • 3

    Uses PPO-Lagrangian method for constrained policy optimization.

Limitations

  • Requires physically intuitive tuning for energy constraints.

  • Dependent on privileged observations for reward critic training.

Keywords

model predictive controlreinforcement learningconstrained optimizationLagrangian methodenergy-constrained optimizationhumanoid roboticssim-to-sim transfersim-to-real transfer

More in Reinforcement Learning

View all
ECO: Energy-Constrained Optimization with Reinforcement Learning for Humanoid Walking | Paperchime