Image compression and reconstruction using pretrained autoencoder, vqgan first stage models from latent diffusion and taming transformer repo. Model codes, configs are copied from these repos with unnecessary parts removed.
Saves autoencoding model encoded output as compressed format. This output is passed to decoder on receiver side to reconstruct a lossy compressed version original image. Based on chosen seettings of autoencoding models the encoded output or its indices in case vqgan can be saved which can be used for reconstuction.
To save vram and prevent extra processing only encoder or decoder weights based on compression or decompression task is loaded. Training code is removed but should be able to load models trained on original repo.
Compressed data is saved in safetensors format. For compression if batch size larger than 1 is used then each output contains encode output tensor for the whole batch.
vq-f4, vq-f8, kl-f4, kl-f8 configs provide the best reconstruction results.
Compressing with vqgan model by removing --kl and adding --vq_ind for vq-f4, vq-f8 should provide best compression ratio. Further using a zip program to compress the saved output may provide better quality and file size reduction than using jpeg with quality reduced to around 60 percent. A good quality pretrained reconstruction model for vq-f8 followed by zip compression may provide best results in terms of file size.
When using --vq_ind also adding ind_bit to 8 should give the most compressed output though not best quality. It will not work with most configs as output value ranges need to be from 0-255. Only this config and its associated model will work with int_bit set to 8.
Run following command on setup.py folder before running library.
pip install -e .
kl compress,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --kl --batch 2 --img_size 384
vq compress with indices,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --batch 1 --img_size 512 --vq_ind --ind_bit 16
kl decompress,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --kl --dc
vq decompress with indices,
python compression.py -s "SRC_PATH" -d "DEST_PATH" --cfg "CONFIG_YAML_PATH" --ckpt "VAE_CKPT_PATH" --dc --vq_ind
If --dc flag is provided it runs decompression otherwise compresses input.
--aspect resize image keeping aspect ratio with smaller dimension size set to --img_size. May fail for large images not fitting in gpu memory.
For --ind_bit with possible values 8 or 16 vqgan indices are saved as uint8 or int16 reducing compressed output file size. Only needed for compression.
--xformers uses xformers if available to reduce memory consumption and may also increse speed.
--float16 process in float16 precision to reduce memory consumption.
Currently 3 types of data compression is available.
- For --klautoencoder kl pretrained model encode output is saved.
- If --klnot specified then vqgan encode output is saved.
- If --vq_indspecified then indices are saved. These are used to reconstruct image.
Original configs can be found here. More weights can be found on latent diffusion repo. ru-dalle vq-f8-gumbel model trained on taming transformers repo can also be used.
For kl-f8 stable diffusion vae ckpt can be used. Gives 8x downsampling.
- https://huggingface.co/stabilityai/sd-vae-ft-ema-original/tree/main
- https://huggingface.co/stabilityai/sd-vae-ft-mse-original/tree/main
For kl-f4 config,
For vq-f4 config,
Following may provide better compression rates but there maybe noticable degradation in reconstructed images.
For vq-f8 config,
For vq-f8-n256 config,
For kl-f16 config,
For kl-f32 config,
For vq-f8-gumbel config,
For vq-f8-rudalle config,