Generative AI

MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

HHojung JungRRodrigo HormazabalJJaehyeong JoYYoungrok ParkKKyunggeun RohSSe-Young YunSSehui HanDDae-Woong Jeong
Published
February 19, 2026
Authors
8
Word Count
11,066
Code
Includes code

MolHIT advances molecular generation by combining hierarchical structure with discrete diffusion modeling.

Abstract

Molecular generation with diffusion models has emerged as a promising direction for AI-driven drug discovery and materials science. While graph diffusion models have been widely adopted due to the discrete nature of 2D molecular graphs, existing models suffer from low chemical validity and struggle to meet the desired properties compared to 1D modeling. In this work, we introduce MolHIT, a powerful molecular graph generation framework that overcomes long-standing performance limitations in existing methods. MolHIT is based on the Hierarchical Discrete Diffusion Model, which generalizes discrete diffusion to additional categories that encode chemical priors, and decoupled atom encoding that splits the atom types according to their chemical roles. Overall, MolHIT achieves new state-of-the-art performance on the MOSES dataset with near-perfect validity for the first time in graph diffusion, surpassing strong 1D baselines across multiple metrics. We further demonstrate strong performance in downstream tasks, including multi-property guided generation and scaffold extension.

Key Takeaways

  • 1

    MolHIT solves the quality-novelty tradeoff in molecular generation by using hierarchical discrete diffusion models.

  • 2

    Chemical affinities between atoms should be incorporated into diffusion models rather than treating all atoms equally.

  • 3

    Fine-grained atomic descriptors beyond atomic numbers are essential for accurate molecular reconstruction and validity.

Limitations

  • Existing sequence models achieve high validity but suffer from severe memorization and lack scaffold novelty.

  • Previous graph diffusion models generate chemically invalid molecules with impossible valencies and bond configurations.

Keywords

diffusion modelsmolecular generationgraph diffusion modelschemical validityhierarchical discrete diffusion modelchemical priorsatom encodingMOSES datasetmulti-property guided generationscaffold extension

More in Generative AI

View all
MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models | Paperchime