Pathfinder: Protein Structure Ensemble Clustering and Representative Selection

Fig. 1: Overview of the Pathfinder clustering pipeline, from PDB input to ranked representatives.

Pathfinder is a tool for clustering protein structure ensembles (e.g., from AlphaFold predictions) and selecting representative conformations. It supports dimensionality reduction via distance maps or TM-score matrices, followed by clustering using algorithms like K-means, hierarchical, DBSCAN, spectral, or GMM. Optional integration with reference structures enables ranking based on TM-align scores.

The pipeline processes PDB files in ensembles, extracts features, clusters them, and outputs ranked representatives with metrics.

Features

Feature Extraction: Residue distance maps or TM-score distance matrices of given ensemble(s).
Clustering: Multiple algorithms with/without PCA dimensionality reduction.
Ranking: Confidence-weighted selection and ranking; optional TM-score comparison to references.
Parallelism: Multi-process support for efficiency.
Batch Processing: Handle single proteins, multiple, or directories via a wrapper script.

Prerequisites

Python 3.8+ with NumPy, Pandas, scikit-learn, and SciPy.
External tools (installed with conda):
- Foldseek (for TM-score matrices).
- TM-align (binary in PATH).
A conda environment (example provided in scripts).

Installation

Clone the repository:

git clone https://github.com/yourusername/pathfinder.git
cd pathfinder

Create and activate a mamba environment:

mamba create -n pathfinder -f environment.yml 
mamba activate pathfinder

Quick Start

Activate your environment and ensure src/ and scripts are in your PATH or current directory.

Run the test script to see an example:

chmod +x run_test.sh
./run_test.sh

python src/main.py \
    --ensemble_dir /path/to/ensemble/dir \
    --output_dir /path/to/output/dir \
    --cluster_method kmeans \
    --n_clusters 10 \
    --n_pca_components 10 \
    --transformer tmscore \
    --alpha 1.0 \
    --n_processes 32 \
    --ref_list_txt /path/to/refs.txt  # Optional

Interactive ensemble analysis and state identification

cd dashapp
python app.py

Fig. 1: Overview of the ensemble analysis and state identification interactive utility

Cite

To be added

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
assets		assets
dashapp		dashapp
data		data
notebooks		notebooks
src		src
test_output		test_output
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
run_selection.sh		run_selection.sh
run_test.sh		run_test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pathfinder: Protein Structure Ensemble Clustering and Representative Selection

Features

Prerequisites

Installation

Quick Start

Interactive ensemble analysis and state identification

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

wallnerlab/pathfinder

Folders and files

Latest commit

History

Repository files navigation

Pathfinder: Protein Structure Ensemble Clustering and Representative Selection

Features

Prerequisites

Installation

Quick Start

Interactive ensemble analysis and state identification

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages