Paper: DP2Unlearning: An Efficient and Guaranteed Unlearning Framework for Llms
To set up the environment for the project, create a conda environment using the following command:
$ conda create --name torch-env pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia
$ conda activate torch-envThen, install the following libraries:
pip install datasets accelerate evaluate matplotlib hydra-core omegaconf peft rouge_score tqdm einops packaging bitsandbytes scipy ninjaAlso you may install additional libraries if required.
To perform traditional retraining from scratch, run the following command:
python finetune.py --config-path /home/user_name/project_name/config --config-name finetune.yamlDo necessary modification in finetune.yaml file based on your hardware and GPU capacity.
To train a disclosure-protected base model for unlearning, use one of the following options:
python DP2U-MLM.py %(to transform raw data to disclosure protected data using DP-MLM)
python Train_dp_MLM.py --config-path /home/user_name/project_name/config --config-name Train_dp_MLM.yamlor
python Train_dp_SGD.py --config-path /home/user_name/project_name/config --config-name Train_dp_SGD.yamlDo necessary modification in Train_dp_MLM.yaml or Train_dp_SGD.yaml based on your hardware and GPU capacity.
For DP2Unlearning fine-tuning, run:
python FT_BaseModel.py --config-path /home/user_name/project_name/config --config-name FT_BaseModel.yamlDo necessary modification to FT_BaseModel.yaml based on forgetting percentage (1%:retain99, 5%:retain95, or 10%:retain90)
To perform approximate unlearning fine-tuning, execute the following:
python forget.py --config-path /home/user_name/project_name/config --config-name forget.yamlTo evaluate the models, use this command:
python evaluate_util.py --config-path /home/user_name/project_name/config --config-name eval_everything.yamlYou need to provide the specific model path that you wish to evaluate.
To aggregate the evaluation statistics, use:
python aggregate_eval_stat.py --config-path /home/user_name/project_name/config --config-name aggregate_eval_stat.yamlEnsure you have the paths to your results:
retain_result=${path_to_traditional_retraining_from_scratch}
ckpt_result=${path_to_your_unlearned_method}To run the Beyond KS Test, execute:
python Beyond_KS_test.py --config-path /home/user_name/project_name/config --config-name aggregate_eval_stat.yamlThe baseline methods are implemented from [1]