AI Agents

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

JJinpeng ChenCCheng GongHHanbo LiZZiru LiuZZichen TianXXinyu FuSShi WuCChenyang ZhangWWu ZhangSSuiyun ZhangDDandan TuRRui Liu
Published
March 2, 2026
Authors
12

Abstract

Developing multi-turn interactive tool-use agents is challenging because real-world user needs are often complex and ambiguous, yet agents must execute deterministic actions to satisfy them. To address this gap, we introduce CoVe (Constraint-Verification), a post-training data synthesis framework designed for training interactive tool-use agents while ensuring both data complexity and correctness. CoVe begins by defining explicit task constraints, which serve a dual role: they guide the generation of complex trajectories and act as deterministic verifiers for assessing trajectory quality. This enables the creation of high-quality training trajectories for supervised fine-tuning (SFT) and the derivation of accurate reward signals for reinforcement learning (RL). Our evaluation on the challenging τ^2-bench benchmark demonstrates the effectiveness of the framework. Notably, our compact CoVe-4B model achieves success rates of 43.0\% and 59.4\% in the Airline and Retail domains, respectively; its overall performance significantly outperforms strong baselines of similar scale and remains competitive with models up to 17times its size. These results indicate that CoVe provides an effective and efficient pathway for synthesizing training data for state-of-the-art interactive tool-use agents. To support future research, we open-source our code, trained model, and the full set of 12K high-quality trajectories used for training.

Keywords

post-training data synthesissupervised fine-tuningreinforcement learningtask constraintstrajectory generationconstraint verificationinteractive tool-use agentsτ²-bench benchmarkCoVe-4B

More in AI Agents

View all
CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification | Paperchime