Generative AI

Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning

XXu MaYYitian ZhangQQihua DongYYun Fu
Published
February 10, 2026
Authors
4
Word Count
5,867

Open-source dataset of 6M high-quality image pairs for fine-tuning text-to-image models.

Abstract

High-quality and open datasets remain a major bottleneck for text-to-image (T2I) fine-tuning. Despite rapid progress in model architectures and training pipelines, most publicly available fine-tuning datasets suffer from low resolution, poor text-image alignment, or limited diversity, resulting in a clear performance gap between open research models and enterprise-grade models. In this work, we present Fine-T2I, a large-scale, high-quality, and fully open dataset for T2I fine-tuning. Fine-T2I spans 10 task combinations, 32 prompt categories, 11 visual styles, and 5 prompt templates, and combines synthetic images generated by strong modern models with carefully curated real images from professional photographers. All samples are rigorously filtered for text-image alignment, visual fidelity, and prompt quality, with over 95% of initial candidates removed. The final dataset contains over 6 million text-image pairs, around 2 TB on disk, approaching the scale of pretraining datasets while maintaining fine-tuning-level quality. Across a diverse set of pretrained diffusion and autoregressive models, fine-tuning on Fine-T2I consistently improves both generation quality and instruction adherence, as validated by human evaluation, visual comparison, and automatic metrics. We release Fine-T2I under an open license to help close the data gap in T2I fine-tuning in the open community.

Key Takeaways

  • 1

    Fine-T2I releases 6 million high-quality text-image pairs specifically designed for fine-tuning image generation models.

  • 2

    Semantic deduplication using sentence embeddings removed 90% of duplicate prompts from LLM-generated data.

  • 3

    High-quality fine-tuning data is expensive and proprietary, making open datasets critical for democratizing AI.

Limitations

  • Existing datasets suffer from low resolution, poor text-image alignment, and lack careful distribution analysis.

  • LLM-generated prompts exhibited massive duplication problems requiring sophisticated semantic deduplication techniques to resolve.

Keywords

text-to-imagefine-tuningdiffusion modelsautoregressive modelstext-image alignmentvisual fidelityprompt qualitydataset scaling

More in Generative AI

View all
Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning | Paperchime