Run with: python train.py --datadir SBU_dataset --threed_data
I have cuda 12.6. Cuda is required
Put the pretrained models under /pretrained Put the dataset in SBU_dataset/SBU
https://www.kaggle.com/datasets/dasmehdixtr/two-person-interaction-kinect-dataset/data
Link to pretrained I3D-50 kinetics model which they use: https://github.com/IBM/action-recognition-pytorch/releases/download/weights-v0.1/K400-I3D-ResNet-50-f32.pth.tar
Link to pretrained resnet50 on imagenet: https://download.pytorch.org/models/resnet50-19c8e357.pth
degradNet.py = BDQ module
budgetNet.py = 2d conv (likely privacy predictor)
utilityNet.py = 3d conv (likely action predictor)
SBU dataset format: path, (doesnt matter always 1), num_frames, privacy_attribute, action attributes 13 number of privacy classes 8 number of action classes
in budgetNet.py ResNet class, num_classes=13 corresponds to privacy classes
in utilityNet.py ResNet class, num_classes=8 corresponds to action classes
in utils.py line 179 "-2" is alpha parameter defined in section 4.2. This is specific to SBU
VideoDataset: groups = 16, frames per group = 1
Missing files in dataset Incorrect naming of images in dataset. Incorrect shapes of inputs: RuntimeError: Given groups=3, weight of size [3, 1, 1, 5, 5], expected input[1, 187, 48, 224, 224] to have 3 channels, but got 187 channels instead Inconsistent operations with repo and paper, see figure 2 and degrad.net. The paper says conv2d but they use conv3d. This causes the input of size: [187, 48, 224, 224] to not have enough channels. where 48 = 16(frames)*3(rgb). For the conv3d to work the input should be expanded to Had to add arg --threed_data