This repository contains the code I used to produce results for my project thesis at Zurich University of Applied Sciences. By using a new loss function based on sanity checks, we achieve unsupervised domain adaptation for vertebrae detection and identification.
I extended the work of McCouat and Glocker, "Vertebrae Detection and Localization in CT with Two-Stage CNNs and Dense Annotations", MICCAI workshop MSKI, 2019 and resued some of the code.
The purpose of this repository is so that other researchers can reproduce the results.
Clone this repository and create a conda environment:
conda create -n uda-vdi python
conda activate uda-vdi
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
pip install -r requirements.txtInstall a tool to extract .rar files:
sudo apt-get update
sudo apt install unrarPlease use the following folder structure:
root/
|-data
|-biomedia
|-training_dataset
|-testing_dataset
|-samples
|-detection
|-training
|-testing
|-identification
|-training
|-testing
|-covid19-ct
|-subjects (only temporarly during downloading files)
|-dataset (only temporarly during downloading files)
|-training_dataset_labeled
|-testing_dataset_labeled
|-training_dataset_labeled
|-testing_dataset_labeled
|-samples
|-detection
|-testing_labeled
|-identification
|-training
|-testing
|-training_labeled
|-testing_labeled
|-src
|-plots_debug
|-models
|-preprocessing
|-utility_functions
- Download the data from BioMedia: https://biomedia.doc.ic.ac.uk/data/spine/.
- In the dropbox package there are collections of spine scans called 'spine-1', 'spine-2', 'spine-3', 'spine-4' and 'spine-5', download and unzip these files and move all these scans into a directory called 'data/biomedia/training_dataset'. You will also see a zip file called 'spine-test-data', download and unzip this file and store it to 'data/biomedia/testing_dataset'.
- Download the dataset from https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/6ACUZJ by using the script src/preprocessing/download_harvard_dataset.sh (Note: replace the API-Token with your personal access token).
cd data/covid19-ct/subjects
bash ../../../src/preprocessing/download_harvard_dataset.shAfterwards, unzip the downloaded dataverse_files.zip file:
unzip dataverse_files.zip
rm dataverse_files.zip # delete this big fileMultiple Subject (xxx).rar files are extracted - These files can be unzipped as well as split into training and testing data sets using the command:
cd src
python preprocessing/unzip_harvard_covid.py --dataset_path ../data/covid19-ct/subjects --tmp_path ../data/covid19-ct/datasetCopy the labels in the corresponding folder data/covid19-ct
The downloaded scans have to be divided into smaller patches. Therefore, use the script src/generate_detection_samples.py
BioMedia Data Set:
cd src
python generate_detection_samples.py --training_dataset_dir ../data/biomedia/training_dataset --testing_dataset_dir ../data/biomedia/testing_dataset --training_sample_dir ../data/biomedia/samples/detection/training --testing_sample_dir ../data/biomedia/samples/detection/testing --volume_format .nii.gz --label_format .lmlCovid19-CT Data Set:
cd src
python generate_detection_samples.py --testing_dataset_dir ../data/covid19-ct/testing_dataset_labeled --testing_sample_dir ../data/covid19-ct/samples/detection/testing_labeled --volume_format .dcm --label_format .nii.gzRun the training of the detection module:
python train.py --epochs 100 --lr 0.001 --batch_size 16 --use_wandb --no_da --use_labeled_tgt- set
testing_dataset_direither to../data/biomedia/testing_datasetor../data/covid19-ct/testing_dataset_labeled - When using the
covid19-ctdata set, then setvolume_format:.dcmandlabel_format:.nii.gz, - when using the
biomediadata set, then setvolume_format:.nii.gzandlabel_format:.lml
python measure.py --testing_dataset_dir <testing_dataset_dir> --volume_format <volume_format> --label_format <label_format> --resume_detection <path/to/detection_model.pth> --ignore_small_masks_detectionThe unsupervised domain adaptation loss of the identification module requires detection samples. Generate these by running:
python measure.py --testing_dataset_dir ../data/covid19-ct/training_dataset --volume_format .dcm --label_format .nii.gz --resume_detection <path/to/detection_model.pth> --without_label --save_detections --ignore_small_masks_detection --n_plots -1
python measure.py --testing_dataset_dir ../data/covid19-ct/testing_dataset --volume_format .dcm --label_format .nii.gz --resume_detection <path/to/detection_model.pth> --without_label --save_detections --ignore_small_masks_detection --n_plots -1
python measure.py --testing_dataset_dir ../data/covid19-ct/training_dataset_labeled --volume_format .dcm --label_format .nii.gz --resume_detection <path/to/detection_model.pth> --without_label --save_detections --ignore_small_masks_detection --n_plots -1
python measure.py --testing_dataset_dir ../data/covid19-ct/testing_dataset_labeled --volume_format .dcm --label_format .nii.gz --resume_detection <path/to/detection_model.pth> --without_label --save_detections --ignore_small_masks_detection --n_plots -1The downloaded scans have to be divided into smaller patches. Therefore, use the script src/generate_identification_samples.py
BioMedia Data Set:
cd src
python generate_identification_samples.py --training_dataset_dir ../data/biomedia/training_dataset --testing_dataset_dir ../data/biomedia/testing_dataset --training_sample_dir ../data/biomedia/samples/identification/training --testing_sample_dir ../data/biomedia/samples/identification/testing --volume_format .nii.gz --label_format .lmlcd src
python generate_identification_samples.py --training_dataset_dir ../data/covid19-ct/training_dataset --testing_dataset_dir ../data/covid19-ct/testing_dataset --training_sample_dir ../data/covid19-ct/samples/identification/training --testing_sample_dir ../data/covid19-ct/samples/identification/testing --without_label --with_detection --volume_format .dcm --label_format .nii.gz
python generate_identification_samples.py --training_dataset_dir ../data/covid19-ct/training_dataset_labeled --testing_dataset_dir ../data/covid19-ct/testing_dataset_labeled --training_sample_dir ../data/covid19-ct/samples/identification/training_labeled --testing_sample_dir ../data/covid19-ct/samples/identification/testing_labeled --with_detection --volume_format .dcm --label_format .nii.gzRun the training of the identification module (optionally, add --train_some_tgt_labels to use some target labels during training):
python train.py --mode identification --use_vertebrae_loss --epochs 100 --lr 0.0005 --batch_size 32 --use_labeled_tgt --use_wandb - set
testing_dataset_direither to../data/biomedia/testing_datasetor../data/covid19-ct/testing_dataset_labeled - When using the
covid19-ctdata set, then setvolume_format:.dcmandlabel_format:.nii.gz, - when using the
biomediadata set, then setvolume_format:.nii.gzandlabel_format:.lml - Add
--n_plots <number-of-samples>(where<number-of-samples>is an int) to use only a subset of the samples
python measure.py --testing_dataset_dir <testing_dataset_dir> --volume_format <volume_format> --label_format <label_format> --resume_detection <path/to/detection_model.pth> --resume_identification <path/to/identification_model.pth> --ignore_small_masks_detectionPlease cite this work as:
@Article{jimaging8080222,
author = {Sager, Pascal and Salzmann, Sebastian and Burn, Felice and Stadelmann, Thilo},
title = {Unsupervised Domain Adaptation for Vertebrae Detection and Identification in 3D CT Volumes Using a Domain Sanity Loss},
journal = {Journal of Imaging},
volume = {8},
year = {2022},
month = {Aug},
number = {8},
article-number = {222},
url = {https://www.mdpi.com/2313-433X/8/8/222},
PubMedID = {36005465},
issn = {2313-433X},
doi = {10.3390/jimaging8080222}
}