Hi authors,
Thank you for your excellent work on DiffusionGuard and for releasing the codebase!
I’m currently working on reproducing the quantitative results in Table 1 (e.g., PSNR, CLIP Directional Similarity, ImageReward, etc.). Despite carefully following the described methodology, my results deviate significantly from those reported.
To ensure a fair and accurate comparison—and to eliminate discrepancies caused by subtle differences in evaluation implementation—could you please provide:
The exact evaluation script used to compute all four metrics (PSNR, CLIP Directional Similarity, ImageReward, CLIP Similarity) on the InpaintGuardBench benchmark?
Including details on: image preprocessing, CLIP model variant (e.g., ViT-B/32), ImageReward version, and metric aggregation.
The complete set of adversarial perturbations (or generated images) corresponding to every test case in both the Seen and Unseen splits used in the paper?
This would allow direct computation of metrics using your official outputs, enabling precise reproduction of Table 1.
Having these artifacts would not only help validate our implementation but also greatly benefit the community’s ability to build upon your work reliably.
Thank you very much for your time and support!