Multimodal AI

ERNIE 5.0 Technical Report

HHaifeng WangHHua WuTTian WuYYu SunJJing LiuDDianhai YuYYanjun MaJJingzhou HeZZhongjun HeDDou HongQQiwen LiuSShuohuan WangJJunyuan ShangZZhenyu ZhangYYuchen DingJJinle ZengJJiabin YangLLiang ShenRRuibiao ChenWWeichong YinSSiyu DingDDai DaiSShikun FengSSiqi BaoBBolei HeYYan ChenZZhenyu JiaoRRuiqing ZhangZZeyu ChenQQingqing DangKKaipeng DengJJiajun JiangEEnlei GongGGuoxia WangYYanlin ShaYYi LiuYYehan ZhengWWeijian XuJJiaxiang LiuZZengfeng ZengYYingqi QuZZhongli LiZZhengkun ZhangXXiyang WangZZixiang XuXXinchao XuZZhengjie HuangDDong WangBBingjin ChenYYue ChangXXing YuanSShiwei HuangQQiao ZhaoXXinzhe DingSShuangshuang QiaoBBaoshan YangBBihong TangBBin LiBBingquan WangBBinhan TangBBinxiong ZhengBBo CuiBBo KeBBo ZhangBBowen ZhangBBoyan ZhangBBoyang LiuCCaiji ZhangCCan LiCChang XuCChao PangCChao ZhangCChaoyi YuanCChen ChenCCheng CuiCChenlin YinCChun GanCChunguang ChaiCChuyu FangCCuiyun HanDDan ZhangDDanlei FengDDanxiang ZhuDDong SunDDongbo LiDDongdong LiDDongdong LiuDDongxue LiuFFan DingFFan HuFFan LiFFan MoFFeisheng WuFFengwei LiuGGangqiang HuGGaofeng LuGGaopeng YongGGexiao TianGGuan WangGGuangchen NiGGuangshuo WuGGuanzhong WangGGuihua LiuGGuishun LiHHaibin LiHHaijian LiangHHaipeng MingHHaisu WangHHaiyang LuHHaiye LinHHan ZhouHHangting LouHHanwen DuHHanzhi ZhangHHao ChenHHao DuHHao LiuHHao ZhouHHaochen JiangHHaodong TianHHaoshuang WangHHaozhe GengHHeju YinHHong ChenHHongchen XueHHongen LiuHHonggeng ZhangHHongji XuHHongwei ChenHHongyang ZhangHHongyuan ZhangHHua LuHHuan ChenHHuan WangHHuang HeHHui LiuHHui ZhongHHuibin RuanJJiafeng LuJJiage LiangJJiahao HuJJiahao HuJJiajie YangJJialin LiJJian ChenJJian WuJJianfeng YangJJianguang JiangJJianhua WangJJianye ChenJJiaodi LiuJJiarui ZhouJJiawei LvJJiaxin ZhouJJiaxuan LiuJJie HanJJie SunJJiefan FangJJihan LiuJJihua LiuJJing HuJJing QianJJing YanJJingdong DuJJingdong WangJJingjing WuJJingyong LiJJinheng WangJJinjin LiJJinliang LuJJinlin YuJJinnan LiuJJixiang FengJJiyi HuangJJiyuan ZhangJJun LiangJJun XiaJJun YuJJunda ChenJJunhao FengJJunhong XiangJJunliang LiKKai LiuKKailun ChenKKairan SuKKang HuKKangkang ZhouKKe ChenKKe WeiKKui HuangKKun WuKKunbin ChenLLei HanLLei SunLLei WenLLinghui MengLLinhao YuLLiping OuyangLLiwen ZhangLLongbin JiLLongzhi WangMMeng SunMMeng TianMMengfei LiMMengqi ZengMMengyu ZhangMMing HongMMingcheng ZhouMMingming HuangMMingxin ChenMMingzhu CaiNNaibin GuNNemin QiuNNian WangPPeng QiuPPeng ZhaoPPengyu ZouQQi WangQQi XinQQian WangQQiang ZhuQQianhui LuoQQianwei YangQQianyue HeQQifei WuQQinrui LiQQiwen BaoQQuan ZhangQQuanxiang LiuQQunyi XieRRongrui ZhanRRufeng DaiRRui PengRRuian LiuRRuihao XuRRuijie WangRRuixi ZhangRRuixuan LiuRRunsheng ShiRRuting WangSSenbo KangSShan LuSShaofei YuSShaotian GongSShenwei HuSShifeng ZhengSShihao GuoSShilong FanSShiqin LiuSShiwei GuSShixi ZhangSShuai YaoSShuang ZhangSShuangqiao LiuSShuhao LiangSShuwei HeSShuwen YangSSijun HeSSiming DaiSSiming WuSSiyi LongSSonghe DengSSuhui DongSSuyin LiangTTeng HuTTianchan XuTTianliang LvTTianmeng YangTTianyi WeiTTiezhu GaoTTing SunTTing ZhangTTingdan LuoWWei HeWWei LuanWWei YinWWei ZhangWWei ZhouWWeibao GongWWeibin LiWWeicheng HuangWWeichong DangWWeiguo ZhuWWeilong ZhangWWeiqi TanWWen HuangWWenbin ChangWWenjing DuWWenlong MiaoWWenpei LuoWWenquan WuXXi ShiXXi ZhaoXXiang GaoXXiangguo ZhangXXiangrui YuXXiangsen WangXXiangzhe WangXXianlong LuoXXianying MaXXiao TanXXiaocong LinXXiaofei WangXXiaofeng PengXXiaofeng WuXXiaojian XuXXiaolan YuanXXiaopeng CuiXXiaotian HanXXiaoxiong LiuXXiaoxu FeiXXiaoxuan WuXXiaoyu WangXXiaoyu ZhangXXin SunXXin WangXXinhui HuangXXinming ZhuXXintong YuXXinyi XuXXinyu WangXXiuxian LiXXuanShi ZhuXXue XuXXueying LvXXuhong LiXXulong WeiXXuyi ChenYYabing ShiYYafeng WangYYamei LiYYan LiuYYanfu ChengYYang GaoYYang LiangYYang WangYYang WangYYang YangYYanlong LiuYYannian FuYYanpeng WangYYanzheng LinYYao ChenYYaozong ShenYYaqian HanYYehua YangYYekun ChaiYYesong WangYYi SongYYichen ZhangYYifei WangYYifeng GuoYYifeng KouYYilong ChenYYilong GuoYYiming WangYYing ChenYYing WangYYingsheng WuYYingzhan LinYYinqi YangYYiran XingYYishu LeiYYixiang TuYYiyan ChenYYong ZhangYYonghua LiYYongqiang MaYYongxing DaiYYongyue ZhangYYu RanYYu SunYYu-Wen Michael ZhangYYuang LiuYYuanle LiuYYuanyuan ZhouYYubo ZhangYYuchen HanYYucheng WangYYude GaoYYuedong LuoYYuehu DongYYufeng HuYYuhui CaoYYuhui YunYYukun ChenYYukun GaoYYukun LiYYumeng ZhangYYun FanYYun MaYYunfei ZhangYYunshen XieYYuping XuYYuqin ZhangYYuqing LiuYYurui LiYYuwen WangYYuxiang LuZZefeng CaiZZelin ZhaoZZelun ZhangZZenan LinZZezhao DongZZhaowu PanZZhaoyu LiuZZhe DongZZhe ZhangZZhen ZhangZZhengfan WuZZhengrui WeiZZhengsheng NingZZhenxing LiZZhenyu LiZZhenyu QianZZhenyun LiZZhi LiZZhichao ChenZZhicheng DongZZhida FengZZhifan FengZZhihao DengZZhijin YuZZhiyang ChenZZhonghui ZhengZZhuangzhuang GuoZZhujun ZhangZZhuo SunZZichang LiuZZihan LinZZihao HuangZZihe ZhuZZiheng ZhaoZZiping ChenZZixuan ZhuZZiyang XuZZiyi LiangZZiyuan Gao
Published
February 4, 2026
Authors
438
Word Count
20,009

Revolutionizing multimodal understanding with a unified model.

Abstract

In this report, we introduce ERNIE 5.0, a natively autoregressive foundation model desinged for unified multimodal understanding and generation across text, image, video, and audio. All modalities are trained from scratch under a unified next-group-of-tokens prediction objective, based on an ultra-sparse mixture-of-experts (MoE) architecture with modality-agnostic expert routing. To address practical challenges in large-scale deployment under diverse resource constraints, ERNIE 5.0 adopts a novel elastic training paradigm. Within a single pre-training run, the model learns a family of sub-models with varying depths, expert capacities, and routing sparsity, enabling flexible trade-offs among performance, model size, and inference latency in memory- or time-constrained scenarios. Moreover, we systematically address the challenges of scaling reinforcement learning to unified foundation models, thereby guaranteeing efficient and stable post-training under ultra-sparse MoE architectures and diverse multimodal settings. Extensive experiments demonstrate that ERNIE 5.0 achieves strong and balanced performance across multiple modalities. To the best of our knowledge, among publicly disclosed models, ERNIE 5.0 represents the first production-scale realization of a trillion-parameter unified autoregressive model that supports both multimodal understanding and generation. To facilitate further research, we present detailed visualizations of modality-agnostic expert routing in the unified model, alongside comprehensive empirical analysis of elastic training, aiming to offer profound insights to the community.

Key Takeaways

  • 1

    Unified model for text, image, video, and audio.

  • 2

    Ultra-sparse Mixture-of-Experts for cross-modal interactions.

  • 3

    Elastic training paradigm for dynamic model scaling.

Limitations

  • Complexity in handling diverse token semantics.

  • Requires significant computational resources for optimal performance.

Keywords

autoregressive foundation modelunified multimodal understandingunified next-group-of-tokens prediction objectivemixture-of-expertsmodality-agnostic expert routingelastic training paradigmreinforcement learningsparse MoE architecture

More in Multimodal AI

View all
ERNIE 5.0 Technical Report | Paperchime