enz wraps protein structure prediction methods from pyrosetta and molecular docking from autodock vina for template-based enzyme design.
enz can align an amino acid sequence to a structure to handle your preferred numbering system. any differences between the input sequence and the structure sequences will be mutated.
enz automatically cleans .pdb structures by removing water, duplicated chains and other small molecules that aren't specified as cofactors.
enz uses pyrosetta for structure prediction functions. currently enz only repacks side-chains around the mutation site. planned feature: loop remodelling for flexible regions.
enz uses autodock vina for molecular docking, since it's fairly fast and straightforward. file conversion is handled automatically and the target area is defined by numbering residues. results are scored using vina's built-in scoring system. results objects, like protein objects, contain pandas DataFrames of atomic coordinates which can be used to generate custom score functions. results objects can also be saved fairly easily. currently, side chains are treated as rigid. a planned feature is to enable side chain flexibility in the target site. a known issue is that some atom types (e.g. boron) are rejected by vina. I'll see what i can do about that.
all molecule objects have a mol.df proterty, which displays a pandas dataframe of the molecule's .pdb, including its coordinates which can be very useful for creating custom docking scores.
import enz
wt = 'MTIKEMPQPKTFGELKNLPLLNTDKPVQALMKIADELGEIFKFEAPGRVTRYLSSQRLIKE\
ACDESRFDKNLSQAWKFVRDFAGDGLVTSWTHEKNWKKAHNILLPSFSQQAMKGYHAMMVDIAVQLVQ\
KWERLNADEHIEVPEDMTRLTLDTIGLCGFNYRFNSFYRDQPHPFITSMVRALDEAMNKSQRANPDDP\
AYDENKRQFQEDIKVMNDLVDKIIADRKASGEQSDDLLTHMLNGKDPETGEPLDDENIRYQIITFLIA\
GHETTSGLLSFALYFLVKNPHVLQKAAEEAARVLVDPVPSYKQVKQLKYVGMVLNEALRLWPTAPAFS\
LYAKEDTVLGGEYPLEKGDELMVLIPQLHRDKTIWGDDVEEFRPERFENPSAIPQHAFKPFGNGQRAC\
IGQQFALHEATLVLGMMLKHFDFEDHTNYELDIKETLTLKPEGFVVKAKSKKIPLGGIPSPSTEQSAKKVRK'
p = enz.protein('1jme.pdb', # pdb path
seq = wt, # optional - aligns to structure
key_sites = [78,82,87,330, 181, 188]) # optional - constrain docking to this region
p.mutate(87,'V') # any residue
p.mutate(330, 'I')
p.refold() # repack side chains
p.save('new_structure.pdb') # save any moelcule object as a pdb with the save() method
results = p.dock('CCCCCCCCCCCC=O') # returns a results object
# which contains the docking poses as enz.mol objects and
# the calculated binding energy of each
results = p.save('docking_results') # save .pdb structures in new dir docking_resultsif you have git then clone this repository to your machine
git clone https://github.com/UoMMIB/enz.gitget in the repo!
cd enzYou'll need conda. I've made an environment file that you can automatically install most of the dependencies. set it up with:
conda create -f env.yml # execute from the enz directorythen activate it with
conda activate enzyou'll need to activate this environment before using enz.
Install pyrosetta - download here - requires a "username" and "password", which are sent to you when you apply for a license. Make sure you get the right version for your machine's operating system. I've set up the environment for python 3.7, so best get the python 3.7 version. Note that the download is really slow. On macOSX / linux, pyrosetta is distrubuted as a .tar.bz2 archive. You can decompress these like this:
tar xfvj pypy3.7-v7.3.2-linux64.tar.bz2then install by cd'ing into PyRosetta4.Release.python37.linux.release-269/setup and running:
pip install .Navigate back to enz and install with
pip install .at the base of the enz file tree
that's it. enz is a pain in the ass to install because of pyrosetta, but for now there's not much i can do about that.
You'll need a pdb file template of your protein to work on. Optionally, the amino acid sequence and the names of the cofactors as the occur in the pdb file. Initialising an enz.protein object automatically cleans the pdb structure by removing duplicated chains, water and any other molecules not specified in the cofactors = [... argument. Cleaning the protein in this way is necessary for rosetta and vina compatibility.
You might want to include an amino acid sequence if you'd like to use a residue numbering system that differes from that of the pdb file. The sequence you provide will be aligned to that of the structure and any differences between it and the sequence of the structure will be resolved at refold()
import enz
sequence = 'MSAKBNGFUIAUIEA...'
p = enz.proten('XXXX.pdb', # essential
cofactors = ['NADP'], # optional, must be a list
seq = sequence) # optionalenz.protein and enz.mol objects (docking results) both have a df property, which gives a pandas.DataFrame of the coordinates of each atom. This includes the x_coord y_coord & z_coord for each atom, which is useful to know if you want to score docking on the basis of how close two atoms are together, for example.
p.df
>>> record_name atom_number blank_1 ... element_symbol charge line_idx
0 ATOM 1 ... N NaN 645
1 ATOM 2 ... C NaN 646
2 ATOM 3 ... C NaN 647
3 ATOM 4 ... O NaN 648
4 ATOM 5 ... C NaN 649
... ... ... ... ... ... ... ...Mutate positions in the protein with the mutate(<position int>, <amino acid letter str>). The amino acid letters are case insensitive. Nothing is actually calculated until you call the refold() method. refold() replaces sidechains in the structure at all positions where the aligned enz.protein.seq differs from that of the structure. side chains are repacked within a radiums of 5 Angstroms of the mutation site by default, but this can be tweaked with refold(10) for example. This method calls pyrosetta which spits out a huge ammount of text.
p.mutate(55, 'A')
p.mutate(99,'P')
for i in [100, 122, 188]:
p.mutate(i, 'A')
p.refold()the enz.protein.save('filename.pdb') saves a .pdb file of the protein to a desired location.
dock ligands to your protein with the enz.protein.dock(... method. The method requires the SMILES code for your compound and a list of target_residues - a list of amino acid positions (int) around which the simulation box will be drawn. If you want to have the compounds bind to the active site, then provide some numbers of residues that are in the active site. you don't need all of them, just enough to draw a box around.
an optional argument to dock( is exhaustiveness - use an integer 1-16, for low-high resolution docking respectively.
the dock() method returns a results object that wraps the poses and the calculated affinity of each, as well as a save() method.
results = p.dock('CCCCCCCCCC=O', target_residues = [50, 55, 80, 199, 330])contains the docking results poses and a DataFrame of the scores.
.scores-pandas.DataFrameof each ligand's binding energy as calculated by vina.moderefers to the ligand id.dictionaryprovides access to each docking pose and their binding energy individually.
- the
enz.vina.resultsobject has asave(...)method saves the docking results into a new directory specified as an argument. the directory contains acsvof the scores dataframe, the receptor as apdband apdbof each pose