Jie Zhang*, Zhongqi Wang, Shiguang Shan, Xilin Chen
*Corresponding Author
We propose TwT, an attack method based on syntactic structures that exhibits strong resistance to advanced detection methods.
our approach leverages syntactic structures as backdoor triggers to amplify the sensitivity to textual variations, effectively breaking down the semantic consistency. Besides, a regularization method based on Kernel Maximum Mean Discrepancy (KMMD) is proposed to align the distribution of cross-attention responses between backdoor and benign samples, thereby disrupting attention consistency.
The visualization of cross-attention maps during image generation. TwT generates attacker specified images while effectively mitigating "Assimilation Phenomenon".
Our method accurately recognizes specific syntax, effectively avoiding been identified by pertubation-based method, i.e., UFID. Syntax trigger here is "(DET)(NOUN)(ADP)(DET)(NOUN)(VERB)(ADP)(NOUN)".
TwT has been implemented and tested on Pytorch 2.2.0 with python 3.10. It runs well on both Windows and Linux.
-
We recommend you first use
condato create virtual environment, and installpytorchfollowing official instructions.conda create -n TwT python=3.10 conda activate TwT python -m pip install --upgrade pip pip install torch==2.2.0+cu118 torchvision==0.17.0+cu118 --extra-index-url https://download.pytorch.org/whl/cu118 -
Then you can install required packages thourgh:
pip install -r requirements.txt
- Inject one backdoor w/o pretrained model
CUDA_VISIBLE_DEVICES=0,1 python backdoor_injection_main.py
-c './configs/backdoor_invisible/backdoor_1.yaml' \
-l 1e-2 \
-t './data/train/backdoor_1.txt'\
-p False
- Inject a backdoor into a pretrained model, typically used to sequentially insert backdoors.
CUDA_VISIBLE_DEVICES=0,1 python backdoor_injection_main.py
-c './configs/backdoor_invisible/backdoor_1.yaml' \
-l 1e-2 \
-t './data/train/backdoor_1.txt' \
-p True \
-pp './results/backdoor_1/'
- FID (Frechet Inception Distance)
# generate 30k images
CUDA_VISIBLE_DEVICES=0 python ./metrics/FID_test/generate_images.py --backdoor_model backdoor_1 --epoch 599
# compute fid score
CUDA_VISIBLE_DEVICES=0 python ./metrics/FID_test/fid_score.py --path1 ./coco_val.npz --path2 ./backdoor_1/599
- ASR (Attack Success Rate)
CUDA_VISIBLE_DEVICES=0 python ./metrics/ASR_test/generate_images_asr.py --backdoor_model backdoor_1 --epoch 599
- DSR (Detect Success Rate)
We test our attack methods on three SOTA defense methods, including T2IShield and UFID.
# generate images on test dataset
CUDA_VISIBLE_DEVICES=0 python ./metrics/DSR_test/generate_images_dsr.py --backdoor_model backdoor_1 --epoch 599
# T2IShield-FTT
CUDA_VISIBLE_DEVICES=0 python ./metrics/DSR_test/FTT/detect_FTT.py
# T2IShield-LDA
CUDA_VISIBLE_DIVICES=0 python ./metrics/DSR_test/LDA/detect_LDA.py
# UFID
run UFID_test.ipynb
- TwT achieves an ASR of 97.5%. More results can be found in the paper.
- Here we show some qualitative results of TwT. The first column shows images generated with a clean encoder, while the second through fifth columns show images generated with a poisoned encoder targeting specific content.
Trigger syntax below: (DET)(NOUN)(ADP)(DET)(NOUN)(VERB)(ADP)(NOUN)
If you find this project useful in your research, please consider cite:
@misc{zhang2025twt,
title={Trigger without Trace: Towards Stealthy Backdoor Attack on Text-to-Image Diffusion Models},
author={Jie Zhang and Zhongqi Wang and Shiguang Shan and Xilin Chen},
year={2025},
eprint={2503.17724},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.17724},
}
🤝 Feel free to discuss with us privately!




