Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation (TPAMI 2026)

Tianyi Wei Dongdong Chen Yifan Zhou Xingang Pan

S-lab, Nanyang Technological University; Microsoft GenAI

This repository hosts the official PyTorch implementation of the paper: "Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation".

Our approach can effectively mitigate the subject neglect or mixing issues suffered by MMDiT-based text-to-image models for similar subject generation.

Getting Started

Prerequisites

$ conda create -n enmmdit python=3.10
$ conda activate enmmdit
$ pip install torch==2.5.1+cu121 torchvision==0.20.1+cu121 --index-url https://download.pytorch.org/whl/cu121
$ pip install diffusers==0.31.0 transformers==4.46.2
$ pip install opencv-python sentencepiece protobuf accelerate

Enhancing MMDiT

$ python3 methods/sd3/enmmdit_sd3.py --use_tea_loss --use_so_loss # For SD3
$ python3 methods/sd3point5/enmmdit_sd3point5.py --use_tea_loss --use_so_loss #For SD3.5
$ python3 methods/flux/enmmdit_flux.py #For FLUX

Notes on Key Arguments and Parameters

--use_tea_loss: Enable Text Encoder Alignment Loss.
--use_so_loss: Enable Subject Overlap Loss.
derive_restrict_mask: Enable Overlap Online Detection and Back-to-Start Sampling Strategy.

Citation

If you find our work useful for your research, please consider citing the following papers :)

@article{wei2026enmmdit,
  title={Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation},
  author={Wei, Tianyi and Chen, Dongdong and Zhou, yifan and Pan, Xingang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
dataset		dataset
methods		methods
utils		utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation (TPAMI 2026)

Getting Started

Prerequisites

Enhancing MMDiT

Notes on Key Arguments and Parameters

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation (TPAMI 2026)

Getting Started

Prerequisites

Enhancing MMDiT

Notes on Key Arguments and Parameters

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages