Dock Sampling

From DISI
Jump to navigation Jump to search

Pose Sampling in DOCK

Don't get right DOCK pose? Well, this happens a lot... Docking is a complicated process, but it can be broken down to two main processes: scoring and sampling. This page is about sampling (in DOCK).

To investigate why you don't get the right DOCK pose (in the case you know the answer from a crystal structure), you can consider the following things:

1. Check if the bioactive conformation is sampled in docking. This is described here: http://wiki.uoft.bkslab.org/index.php/Screen3d
2. Dock the bioactive conformation rigidly. This is described here: http://wiki.uoft.bkslab.org/index.php/Multimol2db.py
3. Run scoreopt on the bioactive conformation to get a score for the crystallographic pose of your ligand. This is described here: http://wiki.uoft.bkslab.org/index.php/Scoreopt
4. Check if a bioactive orientation (or an orientation close to it is actually sampled in the docking process. This is what is described on this page.


Bioactive orientation sampled?

Prediction by DOCK depicted in yellow, actual crystal structure depicted in green.
Prediction by DOCK depicted in yellow, actual crystal structure depicted in green.

I want to make the following example: I predicted a fragment binding to the oxyanion hole, but when we determined the crystal structure, the fragment bound to the so-called distal site of AmpC (see picture on the right). In this case I was wondering whether a orientation similar to the crystallographic pose was actually sampled during the docking process.

It requires two steps to address this question:
1. Use Niu Huang's ligand clustering tool (described here: http://wiki.uoft.bkslab.org/index.php/Dock_Ligand_Clustering) to output all orientations sampled during the docking process.
2. Calculate RMSD values for all orientations compared to the crystal structure pose. You can do this easily and fast using DOCK 6.6 as described here: http://wiki.uoft.bkslab.org/index.php/Calculate_DOCK6_RMSD

In this process you will re-dock your ligand and there are (at least) two caveats to this process:
1. Niu Huang's ligand clustering tool is a branch of DOCK 3.6 and therefore (most likely) not the same DOCK binary which you have been using in your docking.
2. If you re-dock your ligand starting from a new db file for your ligand (because your ligand was part of a big library and the original db-file is no more available which is often the case when you docked a big ZINC library), then you won't dock the same Omega conformations and the partial charges generated by Amsol (and therefore also the desolvation energies) will also be changed.

That said: it is the best approximation at this point.

1. Output all orientations sampled during docking

To output all orientations sampled during docking, you can add the ~30 lines of code at the bottom of your original INDOCK file (see here how it works: http://wiki.uoft.bkslab.org/index.php/Dock_Ligand_Clustering), and re-dock the ligand into the original grids. Make sure you set pose_clustering_method to 0 in your INDOCK for ligand clustering (to ensure all poses get written out).

This will output a file called test.eel1.clu, which is a eel1 file with all poses sampled during the docking process. If you want to know how many poses were sampled, type:

grep -c family test.eel1.clu

The answer was 2293 for the small fragment discussed here.

2. Calculate RMSD values for all orientations compared to the crystal structure pose

Prediction by DOCK depicted in yellow, actual crystal structure depicted in green, orientation with lowest RMSD value (1.3A) depicted in magenta.
Prediction by DOCK depicted in yellow, actual crystal structure depicted in green, orientation with lowest RMSD value (1.3A) depicted in magenta.

To calculate RMSD values for all orientations generated above (test.eel1.clu) using DOCK 6.6. you have to do execute two programs:

1. Convert test.eel1.clu to a mol2 file without hydrogens (you need the same number of atoms to calculate RMSD values and your crystal structure likely does not have hydrogens). You can use the script eel12pdb_no_H.csh (see code below) to do this:

./eel12pdb_no_H.csh test.eel1.clu

This will output a file called test.eel1.clu_no_H.mol2.

Code of eel12pdb_no_H.csh:

#!/bin/sh -f

# awk script to convert eel1 file to pdb
# run:
# ./eel12pdb.csh file.eel1

awk '{
  if ($1=="ATOM") 
    print substr($0,0,56)"1.00 20.00           "substr($0,14,1);
  else if ($1=="TER")
    print $0"\n""END";
  else
    print $0;
}' $1 > $1.pdb

# Remove hydrogens 
awk '($3!~"H")' $1.pdb > $1_no_H.pdb 

# convert to mol2 format
convert.py --i $1_no_H.pdb --o $1_no_H.mol2


2. Calculate RMSD values for each orientation using DOCK 6.6 as described here: http://wiki.uoft.bkslab.org/index.php/Calculate_DOCK6_RMSD

calc_dock6_rmsd.csh test.eel1.clu_no_H.mol2 nz14_xtal_pose.mol2 | grep RMSDh | sort -rgk 3

The result of this analysis was that there about a dozen orientations sampled with a RMSD < 2 Angstroem. The closest orientation had a RMSD value of 1.3 Angstroem compared to the crystal structure and is depicted on the right in magenta. Its DOCK score was -30, which was much lower than than the best scoring orientation (DOCK score: -80).


3. Optional: calculate electrostatics, vdw and desolvation for a DOCK orientation

One little problem of the ligand clustering DOCK binary is that it outputs only the total DOCK score for each orientation sampled. And for some strange reason it prefers not to output the desolvation and atom type columns in test.eel1.clu. But you may be interested in the individual terms (electrostatics, vdw and desolvation) contributing to the total DOCK score of an individual orientation!

To get the individual terms of a orientation output in test.eel1.clu, you can use the the script append_solv2eel1_file.csh (see code below). Before running this script, you have to copy a individual orientation from test.eel1.clu into a new file. In the example below I copied the fifth highest ranking orientation. Copy the three REMARK lines, all ATOM lines, and the TER statement.
Then run:

./append_solv2eel1_file.csh test.eel1.clu5

This creates a file called test.eel1.clu5_solv, with desolvation energies and atom types extracted from test.eel1, which is output for the highest ranking orientation.
You can then run Scoreopt:

/raid5/people/mysinger/code/mud/trunk/doscoreopt.csh test.eel1.clu5_solv ../../../grids > test.eel1.clu5_solv.scoreopt.log

The output file test.eel1.clu5_solv.scores summarizes then the individual terms (electrostatics, vdw and desolvation) contributing to the total DOCK score.

Code of append_solv2eel1_file.csh:

#!/bin/bash

awk '{
     if ($1=="ATOM") 
       print substr($0,65,10);
     if ($1=="TER")
       print $0;

}' test.eel1 > solvation.txt

awk '{
     if (($1=="ATOM") || ($1=="TER"))
      print $0; 
}' $1 > $1_ATOM

awk  '
  BEGIN {
          # load array with contents f1.txt
          while ( getline < "solvation.txt" > 0 )
            {
              f1_counter++
              f1[f1_counter] = $0
            }
  }
  { 
    if ($1=="ATOM") 
      print $0,"",f1[NR];
    else
      print $0;
   } ' $1_ATOM > $1_solv


Acknowledgements

Thanks to RGC for lots of docking advice and Multimol2db.py. Thanks to MM for Scoreopt. Thanks to Niu Huang and his group for Dock Ligand Clustering: it's awesome!