Skip to content

sten-vw/BDQ_PrivacyAR_Reproduction

 
 

Repository files navigation

BDQ_PrivacyAR

Run with: python train.py --datadir SBU_dataset --threed_data

Environment

I have cuda 12.6. Cuda is required

Setup

Put the pretrained models under /pretrained Put the dataset in SBU_dataset/SBU

Dataset Link

https://www.kaggle.com/datasets/dasmehdixtr/two-person-interaction-kinect-dataset/data

Link to pretrained I3D-50 kinetics model which they use: https://github.com/IBM/action-recognition-pytorch/releases/download/weights-v0.1/K400-I3D-ResNet-50-f32.pth.tar

Link to pretrained resnet50 on imagenet: https://download.pytorch.org/models/resnet50-19c8e357.pth

degradNet.py = BDQ module
budgetNet.py = 2d conv (likely privacy predictor)
utilityNet.py = 3d conv (likely action predictor)

SBU dataset format: path, (doesnt matter always 1), num_frames, privacy_attribute, action attributes 13 number of privacy classes 8 number of action classes

in budgetNet.py ResNet class, num_classes=13 corresponds to privacy classes
in utilityNet.py ResNet class, num_classes=8 corresponds to action classes
in utils.py line 179 "-2" is alpha parameter defined in section 4.2. This is specific to SBU

VideoDataset: groups = 16, frames per group = 1

Problems faced

Missing files in dataset Incorrect naming of images in dataset. Incorrect shapes of inputs: RuntimeError: Given groups=3, weight of size [3, 1, 1, 5, 5], expected input[1, 187, 48, 224, 224] to have 3 channels, but got 187 channels instead Inconsistent operations with repo and paper, see figure 2 and degrad.net. The paper says conv2d but they use conv3d. This causes the input of size: [187, 48, 224, 224] to not have enough channels. where 48 = 16(frames)*3(rgb). For the conv3d to work the input should be expanded to Had to add arg --threed_data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%