Skip to content

KrishnaswamyLab/ImmunoStruct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ImmunoStruct

bioRxiv Twitter Follow GitHub Stars

ImmunoStruct enables multimodal deep learning for immunogenicity prediction

Table of Contents
  1. About The Project
  2. Citation
  3. Getting Started
  4. Usage
  5. Model Architecture
  6. Troubleshooting
  7. Contributing
  8. License
  9. Contact
  10. Acknowledgments

About The Project

ImmunoStruct Architecture

ImmunoStruct is a multimodal deep learning framework that integrates sequence, structural, and biochemical information to predict multi-allele class-I peptide-MHC immunogenicity. By leveraging multimodal data from 26,049 peptide-MHCs and jointly modeling sequence and structure, ImmunoStruct significantly improves immunogenicity prediction performance for both infectious disease epitopes and cancer neoepitopes.

(back to top)

Key Features

  • Multimodal Integration: Combines peptide-MHC protein sequence, structure, and biochemical properties
  • Novel Cancer-Wildtype Contrastive Learning: Enhances specificity for cancer neoepitope detection
  • Enhanced Interpretability: Provides insights into the substructural basis of immunogenicity
Contrastive Learning Approach

(back to top)

Citation

If you use ImmunoStruct in your research, please cite our paper:

@article{givechian2024immunostruct,
  title={ImmunoStruct: a multimodal neural network framework for immunogenicity prediction from peptide-MHC sequence, structure, and biochemical properties},
  author={Givechian, Kevin Bijan and Rocha, Joao Felipe and Yang, Edward and Liu, Chen and Greene, Kerrie and Ying, Rex and Caron, Etienne and Iwasaki, Akiko and Krishnaswamy, Smita},
  journal={bioRxiv},
  pages={2024--11},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

(back to top)

Getting Started

To get ImmunoStruct up and running locally, follow these steps.

Pre-requisites

Before installation, ensure you have:

  • Python 3.10+
  • CUDA-compatible GPU (recommended)
  • Conda package manager
  • Weights & Biases account for experiment tracking

Dependencies

  • python 3.10
  • torch 2.1.2
  • dgl
  • torch_geometric 2.5.3

Installation

  1. Clone the repository

    git clone https://github.com/KrishnaswamyLab/ImmunoStruct.git
    cd ImmunoStruct
  2. Create and activate conda environment

    conda create --name immuno python=3.10 -c anaconda -c conda-forge
    conda activate immuno
  3. Install core dependencies

    conda install cudatoolkit=11.2 wandb pydantic -c conda-forge
    conda install scikit-image pillow matplotlib seaborn tqdm -c anaconda
  4. Install PyTorch

    python -m pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
  5. Install DGL

    python -m pip install dgl -f https://data.dgl.ai/wheels/torch-2.1/cu118/repo.html
    python -m pip install torchdata==0.7.1
  6. Install PyTorch Geometric and related packages

    python -m pip install torch-scatter==2.1.2+pt21cu118 torch-sparse==0.6.18+pt21cu118 torch-cluster==1.6.3+pt21cu118 torch-spline-conv==1.2.2+pt21cu118 torch_geometric==2.5.3 numpy==1.26.3 -f https://data.pyg.org/whl/torch-2.1.2+cu118.html
  7. Install additional packages

    python -m pip install graphein[extras]
    python -m pip install lifelines
    python -m pip install -U phate
    python -m pip install multiscale-phate
  8. Set up environment variables (if needed)

    export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH

(back to top)

Usage

Data Preparation

Place the following files in the data/ folder:

  • cedar_data_final_with_mprop1_mprop2_v2.txt
  • complete_score_Mprops_1_2_smoothed_sasa_v2.txt
  • HLA_27_seqs_csv.csv

Additionally, ensure you have these folders:

  • graph_pyg_Cancer
  • graph_pyg_IEDB

Generate PyG graph files:

These PyG graph files can be generated using the below command from the corresponding AlphaFold folders.

python immunostruct/preprocessing/cancer_graph_construction_new_KBG.py

Training and Testing

  1. Set up Weights & Biases

    Create a project on Weights & Biases matching your project name.

  2. Run Experiments

    # HybridModelv2 with full sequence and sequence loss
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModelv2 --wandb-username YOUR_WANDB_USERNAME
    
    # HybridModel with full sequence and sequence loss
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model HybridModel --wandb-username YOUR_WANDB_USERNAME
    
    # Sequence with fingerprint model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceFpModel --wandb-username YOUR_WANDB_USERNAME
    
    # Sequence-only model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --sequence-loss --model SequenceModel --wandb-username YOUR_WANDB_USERNAME
    
    # Structure-only model
    python train_PropIEDB_PropCancer_ImmunoCancer.py --full-sequence --model StructureModel --wandb-username YOUR_WANDB_USERNAME

(back to top)

Troubleshooting

Common Issues

GLIBCXX Error

ImportError: $some_path/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found

Solution: Add your conda environment path to LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=/path/to/conda/envs/immuno/lib:$LD_LIBRARY_PATH

CUDA Compatibility Issues

  • Ensure your CUDA version matches the PyTorch installation
  • Verify GPU availability with torch.cuda.is_available()

Memory Issues

  • Reduce batch size in training scripts
  • Use gradient checkpointing for large models

Wandb Authentication

  • Login to Wandb: wandb login
  • Ensure project names match between script and Wandb dashboard

(back to top)

License

Distributed under the Yale License. See LICENSE.txt for more information.

(back to top)

Contact

Krishnaswamy Lab - @KrishnaswamyLab

Project Link: https://github.com/KrishnaswamyLab/ImmunoStruct

(back to top)

About

ImmunoStruct enables multimodal deep learning for immunogenicity prediction

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages