daVinci-Dev: Agent-native Mid-training for Software Engineering

JJi ZengDDayuan FuTTiantian MiYYumin ZhuangYYaxing HuangXXuefeng LiLLyumanshan YeMMuhang XieQQishuo HuaZZhen HuangMMohan JiangHHanning WangJJifan LinYYang XiaoJJie SunYYunze WuPPengfei Liu

Published: January 26, 2026
Authors: 17
Word Count: 9,624
Code: Includes code

View on arXiv Download PDF

Revolutionizing LLMs for software engineering with mid-training.

Abstract

Recently, the frontier of Large Language Model (LLM) capabilities has shifted from single-turn code generation to agentic software engineering-a paradigm where models autonomously navigate, edit, and test complex repositories. While post-training methods have become the de facto approach for code agents, **agentic mid-training**-mid-training (MT) on large-scale data that mirrors authentic agentic workflows-remains critically underexplored due to substantial resource requirements, despite offering a more scalable path to instilling foundational agentic behaviors than relying solely on expensive reinforcement learning. A central challenge in realizing effective agentic mid-training is the distribution mismatch between static training data and the dynamic, feedback-rich environment of real development. To address this, we present a systematic study of agentic mid-training, establishing both the data synthesis principles and training methodology for effective agent development at scale. Central to our approach is **agent-native data**-supervision comprising two complementary types of trajectories: **contextually-native trajectories** that preserve the complete information flow an agent experiences, offering broad coverage and diversity; and **environmentally-native trajectories** collected from executable repositories where observations stem from actual tool invocations and test executions, providing depth and interaction authenticity. We verify the model's agentic capabilities on `SWE-Bench Verified`. We demonstrate our superiority over the previous open software engineering mid-training recipe `Kimi-Dev` under two post-training settings with an aligned base model and agentic scaffold, while using less than half mid-training tokens (73.1B). Besides relative advantage, our best performing 32B and 72B models achieve **56.1%** and **58.5%** resolution rates, respectively, which are ...

Key Takeaways

1
Agent-native mid-training enhances LLMs for software engineering.
2
Contextually-native and environmentally-native trajectories improve performance.
3
daVinci-Dev achieves state-of-the-art results with fewer tokens.

Limitations

Requires access to GitHub PRs for dataset construction.
Dependent on the quality of real development environments.

Keywords

Large Language Modelagentic software engineeringmid-trainingdistribution mismatchagent-native datacontextually-native trajectoriesenvironmentally-native trajectoriesSWE-Bench VerifiedKimi-Devresolution rates

More in AI Agents

View all

LongCat-Flash-Thinking-2601 Technical Report

Meituan LongCat Team, Anchun Gui +160

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves ...

Jan 23149

Agentic Reasoning for Large Language Models

Tianxin Wei, Ting-Wei Li +27

Reasoning is a fundamental cognitive process underlying inference, problem-solving, and decision-making. While large language models (LLMs) demonstrate strong reasoning capabilities in closed-world se...

Jan 18149

Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters

Ailin Huang, Ang Li +213

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: ...

Feb 11140

UI-Venus-1.5 Technical Report

Veuns-Team, Changlong Gao +25

GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In ...

Feb 9140

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

Zhiheng Song, Jingshuai Zhang +7

Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decisio...

Feb 26103

More AI Agents papers