Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

JJiyuan WangCChunyu LinLLei SunZZhi CaoYYuyang YinLLang NieZZhenlong YuanXXiangxiang ChuYYunchao WeiKKang LiaoGGuosheng Lin

Published: March 3, 2026
Authors: 11
Word Count: 6,882

View on arXiv Download PDF

RL-based 3D scene editing achieves multi-view consistency using frozen foundation models as reward functions.

Abstract

Leveraging the priors of 2D diffusion models for 3D editing has emerged as a promising paradigm. However, maintaining multi-view consistency in edited results remains challenging, and the extreme scarcity of 3D-consistent editing paired data renders supervised fine-tuning (SFT), the most effective training strategy for editing tasks, infeasible. In this paper, we observe that, while generating multi-view consistent 3D content is highly challenging, verifying 3D consistency is tractable, naturally positioning reinforcement learning (RL) as a feasible solution. Motivated by this, we propose RL3DEdit, a single-pass framework driven by RL optimization with novel rewards derived from the 3D foundation model, VGGT. Specifically, we leverage VGGT's robust priors learned from massive real-world data, feed the edited images, and utilize the output confidence maps and pose estimation errors as reward signals, effectively anchoring the 2D editing priors onto a 3D-consistent manifold via RL. Extensive experiments demonstrate that RL3DEdit achieves stable multi-view consistency and outperforms state-of-the-art methods in editing quality with high efficiency. To promote the development of 3D editing, we will release the code and model.

Key Takeaways

1
RL3DEdit uses reinforcement learning with a 3D foundation model as reward function to achieve multi-view consistent 3D scene editing without paired training data.
2
The method leverages VGGT's confidence maps as a proxy for multi-view consistency, validated through empirical analysis showing confidence drops with inconsistent edits.
3
RL3DEdit enables single-pass inference over 2× faster than previous methods while handling geometry-changing edits like object addition and character pose changes.

Limitations

Method requires base editors with multi-image joint editing capabilities, limiting compatibility to specific 2D editing models like FLUX-Kontext.
Approach relies on VGGT foundation model availability and may not generalize to domains significantly different from VGGT's training data distribution.

Keywords

diffusion modelsreinforcement learning3D editingmulti-view consistencysupervised fine-tuningVGGTreward signals3D foundation model

More in Reinforcement Learning

View all

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Guobin Shen, Chenxiao Zhao +3

Training stability remains a central challenge in reinforcement learning (RL) for large language models (LLMs). Policy staleness, asynchronous training, and mismatches between training and inference e...

Feb 11167

Heterogeneous Agent Collaborative Reinforcement Learning

Zhixia Zhang, Zixuan Huang +8

We introduce Heterogeneous Agent Collaborative Reinforcement Learning (HACRL), a new learning paradigm that addresses the inefficiencies of isolated on-policy optimization. HACRL enables collaborative...

Mar 3146

Your Group-Relative Advantage Is Biased

Fengkai Yang, Zherui Chen +11

Reinforcement Learning from Verifier Rewards (RLVR) has emerged as a widely used approach for post-training large language models on reasoning tasks, with group-based methods such as GRPO and its vari...

Jan 13128

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Yanqi Dai, Yuxiang Ji +4

Reinforcement Learning with Verifiable Rewards (RLVR) offers a robust mechanism for enhancing mathematical reasoning in large models. However, we identify a systematic lack of emphasis on more challen...

Jan 2890

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Xin Xu, Clive Bai +8

Large-scale verifiable prompts underpin the success of Reinforcement Learning with Verifiable Rewards (RLVR), but they contain many uninformative examples and are costly to expand further. Recent stud...

Feb 1289

More Reinforcement Learning papers