Large Language Models

Ministral 3

AAlexander H. LiuKKartik KhandelwalSSandeep SubramanianVVictor JouaultAAbhinav RastogiAAdrien SadéAAlan JeffaresAAlbert JiangAAlexandre CahillAAlexandre GavaudanAAlexandre SablayrollesAAmélie HéliouAAmos YouAAndy EhrenbergAAndy LoAAnton EliseevAAntonia CalviAAvinash SooriyarachchiBBaptiste BoutBBaptiste RozièreBBaudouin De MonicaultCClémence LanfranchiCCorentin BarreauCCyprien CourtotDDaniele GrattarolaDDarius DabertDDiego de las CasasEElliot Chane-SaneFFaruk AhmedGGabrielle BerradaGGaëtan EcrepontGGauthier GuinetGGeorgii NovikovGGuillaume KunschGGuillaume LampleGGuillaume MartinGGunshi GuptaJJan LudziejewskiJJason RuteJJoachim StudniaJJonas AmarJJoséphine DelasJJosselin Somerville RobertsKKarmesh YadavKKhyathi ChanduKKush JainLLaurence AitchisonLLaurent FainsinLLéonard BlierLLingxiao ZhaoLLouis MartinLLucile SaulnierLLuyu GaoMMaarten BuylMMargaret JenningsMMarie PellatMMark PrinsMMathieu PoiréeMMathilde GuillauminMMatthieu DinotMMatthieu FuteralMMaxime DarrinMMaximilian AugustinMMia ChiquierMMichel SchimpfNNathan GrinsztajnNNeha GuptaNNikhil RaghuramanOOlivier BousquetOOlivier DuchennePPatricia WangPPatrick von PlatenPPaul JacobPPaul WamberguePPaula KurylowiczPPavankumar Reddy MuddireddyPPhilomène ChagniotPPierre StockPPravesh AgrawalQQuentin TorrobaRRomain SauvestreRRoman SoletskyiRRupert MenneerSSagar VazeSSamuel BarrySSanchit GandhiSSiddhant WaghjaleSSiddharth GandhiSSoham GhoshSSrijan MishraSSumukh AithalSSzymon AntoniakTTeven Le ScaoTThéo CachetTTheo Simon SorgTThibaut LavrilTThiziri Nait SaadaTThomas ChabalTThomas FoubertTThomas RobertTThomas WangTTim LawsonTTom BewleyTTom BewleyTTom EdwardsUUmar JamilUUmberto TomasiniVValeriia NemychnikovaVVan PhungVVincent MaladièreVVirgile RichardWWassim BouazizWWen-Ding LiWWilliam MarshallXXinghui LiXXinyu YangYYassine El OuahidiYYihan WangYYunhao TangZZaccharie Ramzi
arXiv ID
2601.08584
Published
January 13, 2026
Authors
120
Hugging Face Likes
31
Comments
1

Abstract

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

More in Large Language Models

View all
Ministral 3 | Paperchime