Ligand Flexibility

From DISI
Jump to: navigation, search

The internal degrees of freedom of the ligand can be sampled with the anchor-and-grow incremental construction approach. This conformational search algorithm has been validated for binding mode prediction on sets of ligands that have no more than seven rotatable bonds (Moustakas et al., 2006).


Anchor-and-Grow

The process of docking a molecule using the anchor-first strategy is shown in the Workflow for Anchor-and-Grow Algorithm Ewing et al. 2001. First, the largest rigid substructure of the ligand (anchor) is identified (seeIdentification of Rigid Segments) and rigidly oriented in the active site (orientation) by matching its heavy atoms centers to the receptor sphere centers (see Orienting the Ligand). The anchor orientations are evaluated and optimized using the scoring function (see Scoring) and the energy minimizer (see Minimization). In general, the orientations are then ranked according to their score, spatially clustered by heavy atom root mean squared deviation (RMSD), and pruned (see Pruning the Conformation Search Tree). Next, the remaining flexible portion of the ligand (seeIdentification of Flexible Layers) is built onto the best anchor orientations within the context of the receptor (grow). It is assumed that the shape of the binding site will help restrict the sampling of ligand conformations to those that are most relevant for the receptor geometry.


==Workflow for Anchor-and-Grow Algorithm==

The conformation of a flexible molecule may be searched or relaxed using the flexible_ligand option. Only the torsion angles are modified, not the bond lengths or angles. Therefore, the input geometry of the molecule needs to be of good quality. A structure generated by ZINC is sufficient.

The torsion angle positions reside in an editable file (see flex_drive.tbl on page 111) which is identified with the flex_drive_file parameter. Internal clashes are detected during the torsion drive search based on the clash_overlap or internal_energy parameters, which are independent of scoring function.

Identification of Rigid Segments

A flexible molecule is treated as a collection of rigid segments. Each segment contains the largest set of adjacent atoms separated by non-rotatable bonds. Segments are separated by rotatable bonds.

The first step in segmentation is ring identification. All bonds within molecular rings are treated as rigid. This classification scheme is a first-order approximation of molecular flexibility, since some amount of flexibility can exist in non-aromatic rings. To treat such phenomenon as sugar puckering and chair-boat hexane conformations, the user will need to supply each ring conformation as a separate input molecule. Additional bonds may be specified as rigid by the user (see Manual Specification of Non-rotatable Bonds).


Identification of Rigid Anchor and Flexible Bonds

The second step is flexible bond identification. Each flexible bond is associated with a label defined in an editable file (see flex.defn). The parameter file is identified with the flex_definition_file parameter. Each label in the file contains a definition based on the atom types (and chemical environment) of the bonded atoms. Each label is also flagged as minimizable. Typically, bonds with some degree of double bond character are excluded from minimization so that planarity is preserved. Each label is also associated with a set of preferred torsion positions. The location of each flexible bond is used to partition the molecule into rigid segments. A segment is the largest local set of atoms that contains only non-flexible bonds.


Manual Specification of Non-rotatable Bonds

Currently this functionality is not available!

The user can specify additional bonds to be non-rotatable, to supplement the ring bonds automatically identified by DOCK. Such a technique could be used to preserve the conformation of part of a molecule and isolate it from the conformation search.

Non-rotatable bonds are identified in the Tripos MOL2 format file containing the molecule. The bonds are designated as members of a STATIC BOND SET named RIGID (see Tripos MOL2 Format).

Creation of the RIGID set can be done within Chimera. With the molecule of interest loaded into Chimera, select the portion of the ligand you would like to remain rigid. Then select on File > Save MOL2. Make sure the "Write current selection to @ SETS section of file" is checked and save the file.

Alternatively, the RIGID set can be entered into the MOL2 file by hand. To do this, go to the end of the MOL2 file. If no sets currently exist, then add a SET identifier on a new line. It should contain the text "@<TRIPOS>SET". On a new line add the text "RIGID STATIC BONDS <user> **** Comment". On the next line enter the number of bonds that will be included in the set, followed by the numerical identifier of each bond in the set.

Identification of Flexible Layers

Anchor Selection

An anchor segment is normally selected from the rigid segments in an automatic fashion (see Manual Specification of Non-rotatable Bonds to override this behavior). The molecule is divided into segments that overlap at each rotatable bond. The segment with the largest number of heavy atoms is selected as the first anchor, number of attachment points are also considered. All segments with more heavy atoms then min_anchor_size are tried separately as anchors. The number of anchors can be limited by setting the limit_max_anchors flag to "yes"; max_anchor_num is used to specify the maximum number of anchors to be used (anchors are ordered by heavy atoms and attachment points):

 min_anchor_size 5
 limit_max_anchors yes
 max_anchor_num 5

At most 5 anchors are used and all anchors have at least 5 heavy atoms.

To use a single specific anchor (e.g scaffold with known bonding pose), specify an atom name and its corresponding atom number in the chosen fragment (e.g. if atom number 10 is C16):


 user_specified_anchor yes
 atom_in_anchor C16,10


Identification of Overlapping Segments

When an anchor has been selected, then the molecule is redivided into non-overlapping segments, which are then arranged concentrically about the anchor segment. Segments are reattached to the anchor according to the innermost layer first and within a layer to the largest segment first.

Layered Non-Overlapping Segments

The anchor is processed separately (either oriented, scored, and/or minimized). The remaining segments are subsequently re-attached during the conformation search. The interaction energy between the receptor and the ligand can be optimized with a simplex minimizer (see Minimization).

Pruning the Conformation Search Tree

Starting with version 6.1, there are two methods for pruning. The first method is the one that existed in earlier versions; it is the default and corresponds to input parameter pruning_use_clustering = yes. In this method pruning attempts to retain the best, most diverse configurations using a top-first pruning algorithm, which proceeds as follows. The configurations are ranked according to score. The top-ranked configuration is set aside and used as a reference configuration for the first round of pruning. All remaining configurations are considered candidates for removal. A root-mean-squared distance (RMSD) between each candidate and the reference configuration is computed. Each candidate is then evaluated for removal based on its rank and RMSD using the inequality:

If the factor is greater than number_confs_for_next_growth, as appropriate, the candidate is removed. Based on this factor, a configuration with rank 2 and 0.2 angstroms RMSD is comparable to a configuration with rank 20 and 2.0 angstroms RMSD. The next best scoring configuration which survives the first pass of removal is then set aside and used as a reference configuration for the second round of pruning, and so on. This pruning method biases its search time towards molecules that sample a more diverse set of binding modes. As the values of num_anchors_orients_for_growth and number_confs_for_next_growth are increased, the anchor-first method approaches an exhaustive search.

In the second method, the goal is to bias the sampling towards conformations that are close to the correct binding mode (as optimized using a test set of experimentally solved structures). Much as the method above, the algorithm ranks the generated poses and conformations. Then, all poses that violate a user-defined score cutoff are removed. To facilitate the speed of the calculation, the remaining list is additionally pared back to a user-defined length. In this method, the sampling is driven towards molecules that sample closer to the experimentally determined binding site, and the result is a significantly less diverse set of final poses.

=Time Requirements

The time demand grows linearly with the num_anchors_orients_for_growth, the number_confs_for_next_growth, the number of flexible bonds and the number of torsion positions per bond, as well as the number of anchor segments explored for a given molecule. Using the notation in the Workflow for Anchor-and-Grow Algorithm, the time demand can be expressed as

where the additional terms are:

* NA is the number of anchor segments tried per

molecule.

* NB is the number of rotatable bonds per molecule.

Growth Tree and Statistics

Dock uses Breadth First Search to sample the conformational space of the ligand. The tree is pruned at every stage of growth to remove unsuitable conformations. In order to be as space efficient as possible, DOCK only saves one level of growth at a time unless "write_growth_tree" is turned on. In order to construct the growth tree it was necessary to do the following: (1) Retain all levels of growth (before and after minimization) in memory. (2) Link every conformer to its parent conformer during growth. (3) While writing out the tree, the traversal starts from a fully grown ligand (leaf), moving up the branch (parent conformer) until the ligand anchor (root) is reached. Finally, the growth tree branch is printed as a multi-mol2 file starting from the anchor to the fully grown ligand, including minimizations. This newly implemented feature allows visualization of all stages of growth and optimize behavior of current DOCK routines. Note that the growth trees can easily be visualized using the Viewdock module in the UCSF chimera program. Extra information regarding conformer number, anchor number, parent conformer etc. can also be accessed directly using this tool.

Format for branch files name is as follows:

${Ligand name}_anchor${anchor number}_branch${conformer number of fully grown mol.}.mol2

e.g. LIG1_anchor1_branch4.mol2

The ligand name is that specified in the mol2 file. The anchor number indicates what fragment or portion of the molecule was used as the anchor. The every conformer (both partially and fully grown) is assigned a unique number.

we recommend that users cat files together and compress them.

 cat *_branch*.mol2 > growth_tree.mol2; gzip growth_tree.mol2

In addition, growth statistics are printed to the output files if the verbose flag is used.

 -----------------------------------
 VERBOSE MOLECULE STATS
 
   Number of heavy atoms = 30
   Number of rotatable bonds = 7
   Formal Charge = 1.00
   Molecular Weight = 429.56
   Heavy Atoms = 30
 -----------------------------------
 VERBOSE ORIENTING STATS :
 
 Orienting 10 anchor heavy atom centers
 Sphere Center Matching Parameters:
    tolerance: 0.25; dist_min: 2; min_nodes: 3; max_nodes: 10
 Num of cliques generated: 2298
    Residual Info:
    min residual:    0.0261
    median residual: 0.3932
    max residual:    0.5000
    mean residual:   0.3737
    std residual:    0.0935
  Node Sizes:
    min nodes:    3
    max nodes:    4
    mean nodes:   3.0070
   # of anchor positions: 1000
   -----------------------------------
 VERBOSE GROWTH STATS : ANCHOR #1
   
 32/1000 anchor orients retained (max 1000) t=9.06s
 Lyr 1-1 Segs|Lyr 2-1 Segs|Lyr 3-2 Segs|Lyr 4-2 Segs|Lyr 5-1 Segs|
 Lyr:1 Seg:0 Bond:8 : Sampling 6 dihedrals C6(C.ar)  C4(C.ar)  C3(C.3)  C1(C.3)
 Lyr:1 Seg:0 24/192 retained, Pruning: 6-score 162-clustered  t=10.68s
 Lyr:2 Seg:0 Bond:5 : Sampling 3 dihedrals C4(C.ar)  C3(C.3)  C1(C.3)  N1(N.3)
 Lyr:2 Seg:0 51/72 retained, Pruning: 21-clustered  t=11.38s
 Lyr:3 Seg:0 Bond:1 : Sampling 3 dihedrals C3(C.3)  C1(C.3)  N1(N.3)  S1(S.o2)
 Lyr:3 Seg:0 105/153 retained, Pruning: 7-score 41-clustered  t=13.37s
 Lyr:3 Seg:1 Bond:3 : Sampling 6 dihedrals N4(N.am)  C2(C.2)  C1(C.3)  C3(C.3)
 Lyr:3 Seg:1 86/630 retained, Pruning: 8-score 536-clustered  t=23.93s
 Lyr:4 Seg:0 Bond:43 : Sampling 3 dihedrals C16(C.ar)  S1(S.o2)  N1(N.3)  C1(C.3)
 Lyr:4 Seg:0 90/258 retained, Pruning: 168-clustered  t=28.85s
 Lyr:4 Seg:1 Bond:26 : Sampling 2 dihedrals C11(C.3)  N4(N.am)  C2(C.2)  C1(C.3)
 Lyr:4 Seg:1 147/180 retained, Pruning: 5-score 28-clustered  t=35.28s
 Lyr:5 Seg:0 Bond:46 : Sampling 6 dihedrals C17(C.ar)  C16(C.ar)  S1(S.o2)  N1(N.3)
 Lyr:5 Seg:0 104/882 retained, Pruning: 15-outside grid 22-score 741-clustered  t=77.71s  
 

These are the verbose growth statistics for flexible docking to 1PPH (thrombin). These are printed only when the verbose flag is enabled in the command line. This feature is useful for debugging incomplete growths and other possible issues with the growth routines. This feature is also useful to show progress when docking in larger peptide-like ligands (20+ rotatable bonds) which can take several hours. Cumulative timing in seconds (e.g. t=13.37s) is shown at the end of each line to allow quick profiling of the slowest steps during docking. A separate section is printed for each anchor sampled when using multiple anchors. For anchor #1, the orienting routine produces 1000 orients, and 37 are retained after clustering and minimization. The ligand has 7 rotatable bonds. The second line shows the assignment of layers and segments. For details on the terminology, please consult the DOCK 4 paper. subsequently, two lines of information are printed for each torsion sampled.


Lyr:1 Seg:0 indicates that this is Layer #1 and Segment #0. Layer and segment number starts from zero, and corresponds to the array indices used internally. Bond:8 refers to bond number in the mol2 file read in. "Sampling 6 dihedrals C6(C.ar)  C4(C.ar)  C3(C.3)  C1(C.3)" specifies the exact torsion being sampled. Six dihedral positions are being sampled in this case, as determined by the drive_id in flex_drive.tbl. 21/246 retained means 21 conformers were retained from the 246 conformers generated during growth (41 conformers x 6 dihedral positions = 246 new conformers). The Pruning: section demonstrates how these (246-21) or 225 conformers were pruned: 2 conformers were outside the energy grid, 5 conformers exceeded the score cut-off (see pruning_conformer_score_cutoff) and 218 conformers were clustered. Typically clustering removes the greatest number of conformers during each torsion grown as controlled by the pruning_clustering_cutoff parameter. The reader is encouraged to verify that the number of conformers retained can be calculated as above at each stage of growth. If the growth tree is turned on, the total number of conformers stored in the growth tree are also reported.

NOTE: The following parameter definitions will use the format below:

parameter_name [default] (value):

  1. description


In some cases, parameters are only needed (questions will only be asked) if the parameter above is enforced. These parameters are indicated below by additional indentation.


Flexible Ligand Parameters

       flexible_ligand [yes] (yes, no):
       #Flag to perform ligand conformational searching
           user_specified_anchor [no] (yes no):
           #Flag to let the user choose the anchor segment for growth.
               atom_in_anchor [C1,1] ():
               #Specify an atom within the chosen anchor segment.
           limit_max_anchors [no] (yes no):
           #Flag to limit the number of anchors used during multi-anchor docking.
               max_anchor_num [1] ()::
               #The maximum number of anchor segments to be used.
           min_anchor_size [5] (int):
           #The minimum number of heavy atoms for an anchor segment. Set this to a high number (40) to use a single anchor. For multi-anchor docking, use 5 or 6 (pyrrole or benzene ring respectively).
           pruning_use_clustering [yes] (yes, no):
           #Flag to enable clustering during pruning
               (if pruning_use_clustering = yes)
               pruning_max_orients [1000] (int):
               #The maximum number of anchor orientations carried forward in the
               #anchor & grow search (previously num_anchor_orients_for_growth)
               pruning_clustering_cutoff [100] (int):
               #The pruning value cutoff for anchor orientations promoted to the
               #conformational search (previously number_confs_for_next_growth and
               #referred to as N_c in the equation in Pruning the Conformation Search Tree)
               (if pruning_use_clustering = no)
               pruning_max_orients [1000] (int):
               #Maximum number of anchor orientations promoted to the conformational
               #search
               pruning_orient_score_cutoff [25.0] (float):
               #Maximum score for anchor after minimization
               pruning_max_conformers [75] (int):
               #Maximum number of anchor orientations promoted to the next layer of growth
               pruning_conformer_score_cutoff [100.0] (float):
               #Maximum score for conformation after minimization
           use_clash_overlap [no] (yes, no):
           #Flag to check for overlapping atomic volumes during anchor and grow
               clash_overlap [0.5] (float):
               #The relative threshold for overlapping atomic volumes;
               #a clash exists if the distance between a pair of atoms is less than
               #the clash_overlap times the sum of their atom type radii;
               #thus, a clash_overlap of 0.75 allows 25% (1 - 0.75) of relative overlap.
           write_growth_tree [no] (yes, no):
           #Warning: Writing the growth tree increases memory usage and can generate lots of files. Concatenating and compressing growth tree branches is recommended