AI Agents

SkillOrchestra: Learning to Route Agents via Skill Transfer

JJiayu WangYYifei MingZZixuan KeSShafiq JotyAAws AlbarghouthiFFrederic Sala
Published
February 23, 2026
Authors
6
Word Count
11,351
Code
Includes code

SkillOrchestra routes AI agents efficiently by learning explicit skill capabilities instead of implicit policies.

Abstract

Compound AI systems promise capabilities beyond those of individual models, yet their success depends critically on effective orchestration. Existing routing approaches face two limitations: (1) input-level routers make coarse query-level decisions that ignore evolving task requirements; (2) RL-trained orchestrators are expensive to adapt and often suffer from routing collapse, repeatedly invoking one strong but costly option in multi-turn scenarios. We introduce SkillOrchestra, a framework for skill-aware orchestration. Instead of directly learning a routing policy end-to-end, SkillOrchestra learns fine-grained skills from execution experience and models agent-specific competence and cost under those skills. At deployment, the orchestrator infers the skill demands of the current interaction and selects agents that best satisfy them under an explicit performance-cost trade-off. Extensive experiments across ten benchmarks demonstrate that SkillOrchestra outperforms SoTA RL-based orchestrators by up to 22.5% with 700x and 300x learning cost reduction compared to Router-R1 and ToolOrchestra, respectively. These results show that explicit skill modeling enables scalable, interpretable, and sample-efficient orchestration, offering a principled alternative to data-intensive RL-based approaches. The code is available at: https://github.com/jiayuww/SkillOrchestra.

Key Takeaways

  • 1

    SkillOrchestra uses explicit skill discovery instead of end-to-end RL to avoid routing collapse to expensive models.

  • 2

    The system learns probabilistic Beta distributions for agent-skill pairs to reason about competence systematically.

  • 3

    Skills are discovered from execution traces through LLM-based analysis rather than manual definition.

Limitations

  • Routing collapse remains a challenge when RL policies learn to favor expensive models exclusively.

  • Manual skill definition and redundancy detection require careful curation to prevent handbook bloat.

Keywords

compound AI systemsorchestrationrouting policyreinforcement learningskill modelingagent-specific competenceperformance-cost trade-offmulti-turn scenariosrouting collapseend-to-end learning

More in AI Agents

View all
SkillOrchestra: Learning to Route Agents via Skill Transfer | Paperchime