ProtSpace is a visualization tool for exploring protein embeddings or similarity matrices along their 3D protein structures. It allows users to interactively visualize high-dimensional protein language model data in 2D or 3D space, color-code proteins based on various features, and view protein structures when available.
Web Interface: https://protspace.rostlab.org/
New JavaScript Frontend (in development): https://tsenoner.github.io/protspace_web -> Drag & drop .parquetbundle files
Note: Use Chrome or Firefox for best experience.
# Basic installation (backend - dimensionality reduction only)
pip install protspace
# Full installation (backend + frontend - including visualization interface)
pip install "protspace[frontend]"# Retrieve and analyze proteins from UniProt using sequence similarity (mmmseqs2)
protspace-query -q "(ft_domain:phosphatase) AND (reviewed:true)" -o output_dir -m pca2,pca3,umap2 -f "protein_families,fragment,kingdom,superfamily" --n_neighbors 30 --min_dist 0.4# Analyse and vizualise your locally stored embeddings
protspace-local -i embeddings.h5 -o output_dir -m pca2,umap2protspace output_dirAccess at http://localhost:8050
- Multiple projections: PCA, UMAP, t-SNE, MDS, PaCMAP in 2D/3D
- Automatic feature extraction: Use
-fto color-code proteins by UniProt, InterPro, or Taxonomy features - 3D structure viewer: Integrated protein structure visualization
- Export: SVG (2D) and HTML (3D) formats
UniProt: annotation_score, cc_subcellular_location, fragment, length_fixed, length_quantile, protein_existence, protein_families, reviewed, xref_pdb
InterPro: cath, pfam, signal_peptide, superfamily
Taxonomy: root, domain, kingdom, phylum, class, order, family, genus, species
Examples:
# Extract Pfam domains and subcellular location
protspace-local -i data.h5 -f pfam,cath,cc_subcellular_location
# Extract reviewed status, length, and taxonomy
protspace-query -q "..." -f reviewed,length_quantile,kingdomprotspace-local (Local data):
-i, --input: HDF5 embeddings or CSV similarity matrix (required)-o, --output: Output file or directory (optional, default: derived from input filename)-f, --features: Features to extract (comma-separated) or CSV metadata file path-m, --methods: Reduction methods (e.g.,pca2,umap3,tsne2)--non-binary: Use legacy JSON format--keep-tmp: Cache intermediate files for reuse--bundled: Bundle output files (true/false, default: true)
protspace-query (UniProt search):
-q, --query: UniProt search query (required)-o, --output: Output file or directory (optional, default:protspace.parquetbundle)-f, --features: Features to extract (comma-separated)-m, --methods: Reduction methods (e.g.,pca2,umap3,tsne2)--non-binary: Use legacy JSON format--keep-tmp: Cache intermediate files for reuse--bundled: Bundle output files (true/false, default: true)
Followng the default parameters for each method. Override these to fine-tune dimensionality reduction:
- UMAP:
--n_neighbors 15 --min_dist 0.1 - t-SNE:
--perplexity 30 --learning_rate 200 - PaCMAP:
--mn_ratio 0.5 --fp_ratio 2.0 - MDS:
--n_init 4 --max_iter 300 --eps 1e-3
protspace-feature-colors input.json output.json --feature_styles '{
"feature_name": {
"colors": {"value1": "#FF0000", "value2": "#00FF00"},
"shapes": {"value1": "circle", "value2": "square"}
}
}'Available shapes: circle, circle-open, cross, diamond, diamond-open, square, square-open, x
- UniProt queries: Text queries using UniProt syntax
- Embeddings: HDF5 files (.h5, .hdf5)
- Similarity matrices: CSV files with symmetric matrices
- Metadata: CSV with 'identifier' column + feature columns
- Structures: ZIP files containing PDB/CIF files
- Default: Parquet files (projections_data.parquet, projections_metadata.parquet, selected_features.parquet)
- Legacy: JSON format with
--non-binaryflag - Temporary files: FASTA sequences, similarity matrices, all features (with
--keep-tmp)
@article{SENONER2025168940,
title = {ProtSpace: A Tool for Visualizing Protein Space},
journal = {Journal of Molecular Biology},
pages = {168940},
year = {2025},
issn = {0022-2836},
doi = {https://doi.org/10.1016/j.jmb.2025.168940},
url = {https://www.sciencedirect.com/science/article/pii/S0022283625000063},
author = {Tobias Senoner and Tobias Olenyi and Michael Heinzinger and Anton Spannagl and George Bouras and Burkhard Rost and Ivan Koludarov}
}