Analyzing DOCK Results

From DISI
Revision as of 21:41, 13 February 2014 by Frodo (talk | contribs) (Created page with "Analyzing DOCK results The dock37tools in $d37 contain various analysis programs. Once your jobs are done, you can run: $d37/extract_all.py If your ran a prospective run, it...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Analyzing DOCK results

The dock37tools in $d37 contain various analysis programs. Once your jobs are done, you can run:

$d37/extract_all.py

If your ran a prospective run, it can be advantageous to run extract_all.py and ignore bad poses with scores greater than -20.0 (adjust for your system), like this:

$d37/extract_all.py -s -20.0

This may take awhile but it will pull all your results into a single file, etc. If you want to calculate enrichment, etc.:

$d37/enrich.py -l ligand-file -d decoy-file 

Where the ligand-file and decoy-file are single column files with the ligand and decoy IDs on individual lines. Plotting is also possible

$d37/plots.py -i . -l label --ligand-file=ligand-file -d decoy-file 

Common usage is to plot several different runs on a single plot like so:

$d37/plots.py -i run1 -l label1 -i run2 -l label2 --ligand-file=ligand-file -d decoy-file 

If you want to compare the scores from two runs, try:

$d37/two_run_plot.py run1 run2

Of course, the plots must be run on a machine with the proper libraries installed, like sgehead.

Another common use is to look at top poses in the ViewDock module of UCSF Chimera or with PyMOL. You can make a mol2 output file that can be read by these programs with the following command:

$d37/getposes.py

The defaults on this script are to make a poses.mol2 file with the top 500 poses from the entire run, with a single pose per molecule ID. There are many options which can be seen with the "-h" flag. A more complex example is:

$d37/getposes.py -z -l 1000 -x 2 -f ligands.txt -o ligands.1000.mol2

In order, the '-z' flag connects to ZINC for vendor information, the "-l 1000" flag only gets the first 1000 ligands in the file, '-x 2' gets the top 2 poses, the '-f ligands.txt' file designates the ligand file to use and '-o ligands.1000.mol2' designates the output filename.

If you're curious about the OUTDOCK file format, here is the header:

mol# id_num flexiblecode matched nscored time hac setnum matnum rank cloud elect + vdW + psol + asol + inter + rec_e + rec_d + r_hyd = Total
mol# is just the number of the molecule, read in from the docking db2 files.
id_num is the ZINC code or other identifier for the molecule
flexiblecode is the combination of flexible receptor parts this molecule was docked to
matched is the number of matched orientations actually found by the matching algorithm
nscored is the number of atoms that were scored
time is the time in seconds for this molecule
hac is the heavy atom count for this ligand
setnum is the conformation number this ligand represents
matnum is the match number this ligand represents
rank is the rank of the score for this ligand within the ligand (if you want the top 10 poses, this number will increase from 1 to 10)
cloud is the cloud number, for an experimental matching scheme still under development
electrostatics is the electrostatics score
vdW is the van der Waals score (both attractive and repulsive together)
psol is the ligand polar desolvation
asol is the ligand apolar desolvation
inter is the internal energy
rec_e is the receptor energy (used in flexible docking)
rec_d is the receptor desolvation, not yet supported
r_hyd is the receptor hydrophobic effect, not yet supported
Total is the total score for this ligand pose

http://i.creativecommons.org/l/by-sa/3.0/88x31.png

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ This page is adapted from "DOCK3.7 Documentation" by Ryan G. Coleman. Based on a work at https://sites.google.com/site/dock37wiki/.