Phylo2Vec (or phylo2vec) is a high-performance software package for encoding, manipulating, and analysing binary phylogenetic trees. At its core, the package contains representation of binary trees, which defines a bijection from any tree topology with 𝑛 leaves into an integer vector of size 𝑛 − 1. Compared to the traditional Newick format, phylo2vec was designed with fast sampling, fast conversion/compression from Newick-format trees to the Phylo2Vec format, and rapid tree comparison in mind.
This current version features a core implementation in Rust, providing significant performance improvements and memory efficiency while remaining available in Python (superseding the version described in the original paper) and R via dedicated wrappers, making it accessible to a broad audience in the bioinformatics community.
Link to the paper: https://doi.org/10.1093/sysbio/syae030
The easiest way to install the standard Python package is using pip:
pip install phylo2vecSeveral optimization schemes based on Phylo2Vec are also available, but require extra dependencies. (See this notebook for a demo). To avoid bloating the standard package, these dependencies must be installed separately. To do so, run:
pip install "phylo2vec[opt]"- We recommend setting up pixi package management tool.
- Clone the repository and install using
pixi:
git clone https://github.com/sbhattlab/phylo2vec.git
cd phylo2vec
pixi run -e py-phylo2vec install-pythonThis will compile and install the package as the core functionality is written in Rust.
Retrieve one of the compiled binaries from the
releases that fits your OS.
Once the file is downloaded, simply run install.packages in your R command
line.
install.packages("/path/to/package_file", repos = NULL, type = 'source')devtools::install_github("sbhattlab/phylo2vec", subdir="./r-phylo2vec", build = FALSE)Note: to download a specific version, use:
devtools::install_github("sbhattlab/phylo2vec@vX.Y.Z", subdir="./r-phylo2vec", build = FALSE)Clone the repository and run the following install.packages in your R command
line.
Note: to download a specific version, you can use git checkout to a desired
tag.
git clone https://github.com/sbhattlab/phylo2vec
cd phylo2vecinstall.packages("./r-phylo2vec", repos = NULL, type = 'source')import numpy as np
from phylo2vec import from_newick, to_newick
# Convert a vector to Newick string
v = np.array([0, 1, 2, 3, 4])
newick = to_newick(v) # '(0,(1,(2,(3,(4,5)6)7)8)9)10;'
# Convert Newick string back to vector
v_converted = from_newick(newick) # array([0, 1, 2, 3, 4], dtype=int16)from phylo2vec.utils.vector import add_leaf, remove_leaf, reroot_at_random
# Add a leaf to an existing tree
v_new = add_leaf(v, 2) # Add a leaf to the third position
# Remove a leaf
v_reduced = remove_leaf(v, 1) # Remove the second leaf
# Random rerooting
v_rerooted = reroot_at_random(v)To run the hill climbing-based optimisation scheme presented in the original Phylo2Vec paper, run:
# A hill-climbing scheme to optimize Phylo2Vec vectors
from phylo2vec.opt import HillClimbing
hc = HillClimbing(verbose=True)
hc_result = hc.fit("/path/to/your_fasta_file.fa")We also provide a command-line interface for quick experimentation on phylo2vec-derived objects.
To see the available functions, run:
phylo2vec --helpExamples:
phylo2vec samplev 5 # Sample a vector with 5 leaves
phylo2vec samplem 5 # Sample a matrix with 5 leaves
phylo2vec from_newick '((0,1),2);' # Convert a Newick to a vector
phylo2vec from_newick '((0:0.3,1:0.1):0.5,2:0.4);' # Convert a Newick to a matrix
phylo2vec to_newick 0,1,2 # Convert a vector to Newick
phylo2vec to_newick $'0.0,1.0,2.0\n0.0,3.0,4.0' # Convert a matrix to NewickDescription of the datasets as well as download links are available in in the datasets directory.
Datasets for which a FASTA file is available can be downloaded and loaded into Biopython:
from phylo2vec.datasets import load_alignment
load_alignment("zika")Readily downloadable datasets can be listed using:
from phylo2vec.datasets import list_datasets
list_datasets()For comprehensive documentation, tutorials, and API reference, visit: https://phylo2vec.readthedocs.io
Found a bug or want a new feature? We welcome contributions to phylo2vec! 🤗 Feel free to report any bugs or feature requests on our Issues page. If you want to contribute directly to the project, fork the repository, create a new branch, and open a pull request (PR) on our Pull requests page.
Please refer to our Contributing guidelines for more details how to report bugs, request features, or submit code improvements.
Thanks to all our contributors so far!
This project is distributed under the GNU Lesser General Public License v3.0 (LGPL).
If you use Phylo2Vec in your research, please cite:
@article{10.1093/sysbio/syae030,
author = {Penn, Matthew J and Scheidwasser, Neil and Khurana, Mark P and Duchêne, David A and Donnelly, Christl A and Bhatt, Samir},
title = {Phylo2Vec: a vector representation for binary trees},
journal = {Systematic Biology},
year = {2024},
month = {03},
doi = {10.1093/sysbio/syae030},
url = {https://doi.org/10.1093/sysbio/syae030},
}If you use the software, please cite:
@article{10.21105/joss.09040,
doi = {10.21105/joss.09040},
url = {https://doi.org/10.21105/joss.09040},
year = {2025},
publisher = {The Open Journal},
volume = {10},
number = {114},
pages = {9040},
author = {Scheidwasser, Neil and Nag, Ayush and Penn, Matthew J. and Jakob, Anthony and Andersen, Frederik Mølkjær and Khurana, Mark Poulsen and Setiawan, Landung and Duchêne, David A. and Bhatt, Samir},
title = {phylo2vec: a library for vector-based phylogenetic tree manipulation},
journal = {Journal of Open Source Software}
}- Preprint repository (core functions are deprecated): https://github.com/Neclow/phylo2vec_preprint
- C++ version (deprecated): https://github.com/Neclow/phylo2vec_cpp
- GradME: https://github.com/Neclow/GradME = phylo2vec + minimum evolution + gradient descent