Important note: The number of epochs have been set for both inter-case and intra-case to 200. However in practice, convergence, i.e. when the validation loss stops decreasing, would happen much faster than that (usually within the first 10 epochs). So once you notice that validation loss is no longer decreasing, you can kill the process. The weights for the execution when the validation loss is minimum are saved.
- Download the preprocessed logs from here.
- Navigate to the Intra-case folder.
Make the following changes in train.py for each log:
- Set the paths to the train, validation and test splits at lines 45-52.
- Set the value of maxPrefixLength variable located above where the
train_modelfunction is called to the corresponding value from MaxPrefixLength.md. - Set the path to output the weights where the
train_modelfunction is called.
Run python3 train.py.
Make the following changes in evaluate.py for each log:
- Set the paths to the train, validation and test splits at lines 47-54.
- Set the value of maxPrefixLength variable located above where the
load_state_dictfunction is called to the corresponding value from MaxPrefixLength.md. - Set the path to read the weights where the
load_state_dictfunction is called.
Run python3 evaluate.py
- Download the datasets from here.
- Navigate to the Inter-case folder.
Navigate to the Train folder and make the following changes in the file corresponding to the dataset for each sub-dataset:
- Set the path to the dataset at line 12.
- For the BPIC 2015 and Hospital Billing datasets, the variable dataset_name needs to be set depending on which sub-dataset is being used since the preprocessing for the sub-datasets is slightly different.
- Set the value of maxPrefixLength variable located above where the
train_modelfunction is called to the corresponding value from MaxPrefixLength.md. Note that for the datasets that have only one sub-dataset this value may still not be set to the correct value so please check it. - Set the path to output the weights where the
train_modelfunction is called.
Run python3 {filename.py} where filename.py is the name of the file.
Navigate to the Test folder and make the following changes in the file corresponding to the dataset for each sub-dataset:
- Set the path to the dataset at line 13.
- For the BPIC 2015 and Hospital Billing datasets, the variable dataset_name needs to be set depending on which sub-dataset is being used since the preprocessing for the sub-datasets is slightly different.
- Set the value of maxPrefixLength variable located above where the
load_state_dictfunction is called to the corresponding value from MaxPrefixLength.md. Note that for the datasets that have only one sub-dataset this value may still not be set to the correct value so please check it. - Set the path to read the weights where the
load_state_dictfunction is called.
Run python3 {filename.py} where filename.py is the name of the file.