This directory contains information on how to perform embedding extraction using ALIGNN.
The user requires following files in order to perform embedding extraction
- Sturcture files - contains structure information for a given material (format:
POSCAR,.cif,.xyzor.pdb) - Input-Property file - contains name of the structure file and its corresponding property value (format:
.csv) - Pre-trained model - model trained using ALIGNN using any specific materials property (format:
.zip)
We have provided the an example of Sturcture files (POSCAR files) and Input-Property file (id_prop.csv) in examples. Download the pre-trained model trained on large datasets from here.
Now, in order to perform feature extraction, add the details regarding the model in the all_models dictionary inside the train.py file as described below:
all_models = {
name of the file: [link to the pre-trained model (optional), number of outputs],
name of the file 2: [link to the pre-trained model 2 (optional), number of outputs],
...
}
If the link to the pre-trained model is not provided inside the all_models dictionary, place the zip file of the pre-trained model inside the alignn folder. Once the setup for the pre-trained model is done, the feature extraction can be performed by running the create_features.sh script file which contains the following code:
for filename in ../examples/*.vasp; do
python alignn/pretrained_activation.py --model_name mp_e_form_alignnn --file_format poscar --file_path "$filename" --output_path "examples/data"
done
The script will convert the structure files into atom level encodins one-by-one (batch-wise conversion has not been implemented yet). Example: abc.vasp will produce abc_1.csv to abc_9.csv.
Once you have converted all the structure files in the Input-Property file id_prop.csv using the script file, proceed to GNN for model training.