MOOSE-Star: Unlocking Tractable Training for Scientific Discovery by Breaking the Complexity Barrier

ZZonglin YangLLidong Bing

Published: March 4, 2026
Authors: 2
Word Count: 8,857
Code: Includes code

MOOSE-Star breaks the complexity barrier in AI-driven scientific discovery through decomposed training.

Abstract

While large language models (LLMs) show promise in scientific discovery, existing research focuses on inference or feedback-driven training, leaving the direct modeling of the generative reasoning process, P(hypothesis|background) (P(h|b)), unexplored. We demonstrate that directly training P(h|b) is mathematically intractable due to the combinatorial complexity (O(N^k)) inherent in retrieving and composing inspirations from a vast knowledge base. To break this barrier, we introduce MOOSE-Star, a unified framework enabling tractable training and scalable inference. In the best case, MOOSE-Star reduces complexity from exponential to logarithmic (O(log N)) by (1) training on decomposed subtasks derived from the probabilistic equation of discovery, (2) employing motivation-guided hierarchical search to enable logarithmic retrieval and prune irrelevant subspaces, and (3) utilizing bounded composition for robustness against retrieval noise. To facilitate this, we release TOMATO-Star, a dataset of 108,717 decomposed papers (38,400 GPU hours) for training. Furthermore, we show that while brute-force sampling hits a ''complexity wall,'' MOOSE-Star exhibits continuous test-time scaling.

Key Takeaways

1
Training LLMs directly on P(hypothesis|background) is mathematically intractable due to exponential combinatorial complexity O(N^k).
2
MOOSE-Star decomposes the problem into sequential retrieval and composition steps, reducing complexity from exponential to logarithmic O(log N).
3
The framework uses hierarchical search, bounded composition, and motivation planning to enable scalable training on 108,717 decomposed papers.

Limitations

The approach assumes scientific discoveries can be decomposed into k sequential inspirations from a knowledge base, which may oversimplify complex discovery processes.
TOMATO-Star dataset required 38,400 GPU hours to process, creating significant computational barriers for reproducibility and adoption.

Keywords

large language modelsgenerative reasoningprobabilistic equation of discoverymotivation-guided hierarchical searchbounded compositiondecomposed subtaskscontinuous test-time scaling

More in AI for Science

View all

Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning

Jiaxuan Lu, Ziyu Kong +11

The central challenge of AI for Science is not reasoning alone, but the ability to create computational methods in an open-ended scientific world. Existing LLM-based agents rely on static, pre-defined...

Jan 1243

Learning to Discover at Test Time

Mert Yuksekgonul, Daniel Koceja +9

How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement...

Jan 2235

EEG Foundation Models: Progresses, Benchmarking, and Open Problems

Dingkun Liu, Yuheng Chen +6

Electroencephalography (EEG) foundation models have recently emerged as a promising paradigm for brain-computer interfaces (BCIs), aiming to learn transferable neural representations from large-scale ...

Jan 2518

A^3-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation

Jian Zhang, Yu He +6

Scientific reasoning relies not only on logical inference but also on activating prior knowledge and experiential structures. Memory can efficiently reuse knowledge and enhance reasoning consistency a...

Jan 1415

Qute: Towards Quantum-Native Database

Muzhi Chen, Xuanhe Zhou +8

This paper envisions a quantum database (Qute) that treats quantum computation as a first-class execution option. Unlike prior simulation-based methods that either run quantum algorithms on classical ...

Feb 1611

More AI for Science papers