Enhance the AttnGAN model using state-of-the-art technology such as BERT and CLIP models for richer text interpretation and more detailed image outputs.
- master: The original branch of AttGAN updated to latest torch versions with improved-gan from OpenAI for Inception Score calculation. Serves as our baseline for DAMSM with RNN text encoder and CNN image encoder.
- bert: DAMSM with BERT based text encoder and CNN image encoder
- clip: DAMSM with RNN based text encoder and CLIP image encoder
- clip-text-image: DAMSM with CLIP text encoder and CLIP image encoder
- bert-clip: DAMSM with BERT text encoder and CLIP image encoder
- Download preprocessed metadata forcoco and save them to
data/ - Download coco dataset and extract the images to
data/coco/
pip install the following packages:
python-dateutileasydictpandastorchfilenltkscikit-image==0.19.0torch