Large Language Models

Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

XXiaohan HeSShiyang FengSSongtao HuangLLei BaiBBin WangBBo Zhang
Published
February 12, 2026
Authors
6
Word Count
7,614
Code
Includes code

Self-evolving LLMs solve scientific problems by co-training solver-verifier pairs with sparse supervision.

Abstract

Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scientific reasoning tasks, these models remain fragile due to unreliable solution evaluation and limited diversity in verification strategies. In this work, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier through a transition from sparse supervision to unsupervised learning. In the first stage, the model uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier. In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration on unlabeled data. Experiments on several general scientific benchmarks demonstrate that Sci-CoE enhances complex reasoning capabilities and exhibits strong scalability, facilitating the construction of more robust and diverse evaluation systems. Codes are available at https://github.com/InternScience/Sci-CoE.

Key Takeaways

  • 1

    Sci-CoE trains models as both solvers and verifiers that co-evolve together without heavy expert annotation.

  • 2

    Scientific reasoning requires verification strategies beyond executable tests, using logical and physical constraint checking.

  • 3

    Two-stage training with minimal annotated data establishes correctness foundation before self-improvement scaling.

Limitations

  • Scientific problems lack executable ground truth, making verification significantly harder than code or math problems.

  • External judge model required for verification matrix evaluation, adding computational overhead to the framework.

Keywords

large language modelsscientific reasoningco-evolving paradigmssparse supervisionunsupervised learningverifiergeometric reward mechanismconsensusreliabilitydiversityself-iterationscientific benchmarksscalability

More in Large Language Models

View all
Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision | Paperchime