Large Language Models

Solar Open Technical Report

SSungrae ParkSSanghoon KimJJungho ChoGGyoungjin GimDDawoon JungMMikyoung ChaEEunhae ChooTTaekgyu HongMMinbyul JeongSSeHwan JooMMinsoo KhangEEunwon KimMMinjeong KimSSujeong KimYYunsu KimHHyeonju LeeSSeunghyun LeeSSukyung LeeSSiyoung ParkGGyungin ShinIInseo SongWWonho SongSSeonghoon YangSSeungyoun YiSSanghoon YoonJJeonghyun KoSSeyoung SongKKeunwoo ChoiHHwalsuk LeeSSunghun KimDDu-Seong ChangKKyunghyun ChoJJunsuk ChoeHHwaran LeeJJae-Gil LeeKKyungTae LimAAlice Oh
arXiv ID
2601.07022
Published
January 11, 2026
Authors
37
Hugging Face Likes
56
Comments
2

Abstract

We introduce Solar Open, a 102B-parameter bilingual Mixture-of-Experts language model for underserved languages. Solar Open demonstrates a systematic methodology for building competitive LLMs by addressing three interconnected challenges. First, to train effectively despite data scarcity for underserved languages, we synthesize 4.5T tokens of high-quality, domain-specific, and RL-oriented data. Second, we coordinate this data through a progressive curriculum jointly optimizing composition, quality thresholds, and domain coverage across 20 trillion tokens. Third, to enable reasoning capabilities through scalable RL, we apply our proposed framework SnapPO for efficient optimization. Across benchmarks in English and Korean, Solar Open achieves competitive performance, demonstrating the effectiveness of this methodology for underserved language AI development.

More in Large Language Models

View all
Solar Open Technical Report | Paperchime