Learning to Discover at Test Time
Mert Yuksekgonul, Daniel Koceja, Xinhao Li +8 more
How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the te...