FireRed-Image-Edit-1.0 Techinical Report

SSuper Intelligence TeamCChanghao QiaoCChao HuiCChen LiCCunzheng WangDDejia SongJJiale ZhangJJing LiQQiang XiangRRunqi WangSShuang SunWWei ZhuXXu TangYYao HuYYibo ChenYYuhao HuangYYuxuan DuanZZhiyi ChenZZiyuan Guo

Published: February 12, 2026
Authors: 19
Word Count: 14,525

View on arXiv Download PDF

FireRed achieves production-ready image editing through data curation, not model scale.

Abstract

We present FireRed-Image-Edit, a diffusion transformer for instruction-based image editing that achieves state-of-the-art performance through systematic optimization of data curation, training methodology, and evaluation design. We construct a 1.6B-sample training corpus, comprising 900M text-to-image and 700M image editing pairs from diverse sources. After rigorous cleaning, stratification, auto-labeling, and two-stage filtering, we retain over 100M high-quality samples balanced between generation and editing, ensuring strong semantic coverage and instruction alignment. Our multi-stage training pipeline progressively builds editing capability via pre-training, supervised fine-tuning, and reinforcement learning. To improve data efficiency, we introduce a Multi-Condition Aware Bucket Sampler for variable-resolution batching and Stochastic Instruction Alignment with dynamic prompt re-indexing. To stabilize optimization and enhance controllability, we propose Asymmetric Gradient Optimization for DPO, DiffusionNFT with layout-aware OCR rewards for text editing, and a differentiable Consistency Loss for identity preservation. We further establish REDEdit-Bench, a comprehensive benchmark spanning 15 editing categories, including newly introduced beautification and low-level enhancement tasks. Extensive experiments on REDEdit-Bench and public benchmarks (ImgEdit and GEdit) demonstrate competitive or superior performance against both open-source and proprietary systems. We release code, models, and the benchmark suite to support future research.

Key Takeaways

1
State-of-the-art image editing comes from rigorous data curation and training optimization, not just larger models.
2
Hierarchical multi-stage filtering reduced 1.6 billion raw samples to 100 million high-quality training examples.
3
Multi-Condition Aware Bucket Sampler minimizes wasted computation by grouping images by aspect ratio and input count.

Limitations

Raw data at billion-scale is mostly garbage, requiring obsessive 94% filtering to achieve quality.
The script is incomplete, cutting off mid-sentence during the training methodology discussion.

Keywords

diffusion transformerdata curationtraining methodologyevaluation designtext-to-imageimage editingmulti-condition aware bucket samplerstochastic instruction alignmentasymmetric gradient optimizationDPOdiffusionNFTlayout-aware OCR rewardsdifferentiable consistency lossREDEdit-BenchImgEditGEdit

More in Generative AI

View all

Helios: Real Real-Time Long Video Generation Model

Shenghai Yuan, Yuanyang Yin +4

We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We mak...

Mar 4136

OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens

Yiying Yang, Wei Cheng +6

OmniLottie is a versatile framework that generates high quality vector animations from multi-modal instructions. For flexible motion and visual content control, we focus on Lottie, a light weight JSON...

Mar 2111

Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models

Zengbin Wang, Xuecai Hu +4

Text-to-image (T2I) models have achieved remarkable success in generating high-fidelity images, but they often fail in handling complex spatial relationships, e.g., spatial perception, reasoning, or i...

Jan 28107

VIBE: Visual Instruction Based Editor

Grigorii Alekseenko, Aleksandr Gordeev +8

Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alongside high...

Jan 558

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

Hojung Jung, Rodrigo Hormazabal +6

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the dis...

Feb 1954

More Generative AI papers