SWE-Universe: Scale Real-World Verifiable Environments to Millions

MMouxiang ChenLLei ZhangYYunlong FengXXuwu WangWWenting ZhaoRRuisheng CaoJJiaxi YangJJiawei ChenMMingze LiZZeyao MaHHao GeZZongmeng ZhangZZeyu CuiDDayiheng LiuJJingren ZhouJJianling SunJJunyang LinBBinyuan Hui

Published: February 2, 2026
Authors: 18

View on arXiv Download PDF

Abstract

We propose SWE-Universe, a scalable and efficient framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs). To overcome the prevalent challenges of automatic building, such as low production yield, weak verifiers, and prohibitive cost, our framework utilizes a building agent powered by an efficient custom-trained model. This agent employs iterative self-verification and in-loop hacking detection to ensure the reliable generation of high-fidelity, verifiable tasks. Using this method, we scale the number of real-world multilingual SWE environments to a million scale (807,693). We demonstrate the profound value of our environments through large-scale agentic mid-training and reinforcement learning. Finally, we applied this technique to Qwen3-Max-Thinking and achieved a score of 75.3% on SWE-Bench Verified. Our work provides both a critical resource and a robust methodology to advance the next generation of coding agents.

Keywords

building agentself-verificationin-loop hacking detectionreal-world software engineeringGitHub pull requestscustom-trained modelverifiable environmentsagentic mid-trainingreinforcement learningSWE-Bench Verified

More in AI Agents

View all

LongCat-Flash-Thinking-2601 Technical Report

Meituan LongCat Team, Anchun Gui +160

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves ...

Jan 23149

Agentic Reasoning for Large Language Models

Tianxin Wei, Ting-Wei Li +27

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world se...

Jan 18149

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li +213

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: ...

Feb 11140

UI-Venus-1.5 Technical Report

Veuns-Team, Changlong Gao +25

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In ...

Feb 9140

daVinci-Dev: Agent-native Mid-training for Software Engineering

Ji Zeng, Dayuan Fu +15

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and ...

Jan 26113

More AI Agents papers