http://wiki.docking.org/api.php?action=feedcontributions&user=Mysinger&feedformat=atomDISI - User contributions [en]2024-03-28T16:47:05ZUser contributionsMediaWiki 1.39.1http://wiki.docking.org/index.php?title=Preparing_the_protein&diff=4125Preparing the protein2012-05-10T21:44:01Z<p>Mysinger: switch to solvmap_sev</p>
<hr />
<div>=Preparing the protein=<br />
<br />
Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins. <br />
<br />
==Modifying the PDB file==<br />
<br />
*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements. <br />
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE&curren;) with sulphur (&curren;SD). Be careful about the correct alignment! <br />
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably. <br />
*select the protonation states of HIS residues to be either &delta;- (rename residue to HID), &epsilon;- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.<br />
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS. <br />
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!<br />
<br />
==Running startdockblaster5==<br />
<br />
*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.<br />
*generate the files <tt>.only_spheres</tt> and &ndash; in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> &ndash; <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> . <br />
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand. <br />
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n &mdash; you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch &mdash; it is likely that there are some blanks or hidden characters that are causing the problems. <br />
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms. <br />
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].<br />
<br />
==Removing and modifying files==<br />
<br />
*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs):<br><tt>rm -f PDBPARM chem.* rec+sph.phi solvmap_sev tart.txt OUT*</tt><br />
*modify <tt>rec.crg</tt>: <br />
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK. <br />
**remove all TER statements that might have been added. <br />
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN. <br />
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX. <br />
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble. <br />
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively &rArr; do not tart any residues in this file!<br />
<br />
==Running <tt>[[chemgrid]]</tt> ==<br />
<br />
*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues. <br />
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3<sup>rd</sup> and 4<sup>th</sup> column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (&le; -200).<br />
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM<br />
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.<br />
<br />
==Tarting the protein==<br />
<br />
*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.<br />
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.<br />
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting<br />
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using:<br><tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt><br />
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>. <br />
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt> <br />
*optionally tart the residues that are in contact with a crystallographic ligand, if any. <br />
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.<br />
<br />
==Modifying the Delphi spheres==<br />
<br />
*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres). <br />
*delete the spheres that are too close to the solvent. <br />
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues. <br />
*a good number for DelPhi spheres is 120. <br />
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi! <br />
<br />
==Modifying the Matching spheres==<br />
<br />
*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.<br />
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms. <br />
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.<br />
*a good number for matching spheres is 50-60. <br />
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]]. <br />
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres. <br />
*run <tt>cat $mud/header.sph match2.sph</tt> .<br />
<br />
==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==<br />
<br />
*if you changed rec+sph.crg above, you need to run Delphi <br />
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate. <br />
*run <tt>delphi.com > delphi.log</tt> and check the output.<br />
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.<br />
<br />
==Running <tt>[[newsolv.sev]]</tt> ==<br />
<br />
*if you changed rec.crg or the box above, you need to run newsolv.sev <br />
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>newsolv.sev</tt> .<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Preparing_the_protein&diff=4124Preparing the protein2012-05-10T21:42:58Z<p>Mysinger: change solvmap to newsolv.sev in 2 places</p>
<hr />
<div>=Preparing the protein=<br />
<br />
Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins. <br />
<br />
==Modifying the PDB file==<br />
<br />
*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements. <br />
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE&curren;) with sulphur (&curren;SD). Be careful about the correct alignment! <br />
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably. <br />
*select the protonation states of HIS residues to be either &delta;- (rename residue to HID), &epsilon;- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.<br />
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS. <br />
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!<br />
<br />
==Running startdockblaster5==<br />
<br />
*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.<br />
*generate the files <tt>.only_spheres</tt> and &ndash; in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> &ndash; <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> . <br />
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand. <br />
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n &mdash; you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch &mdash; it is likely that there are some blanks or hidden characters that are causing the problems. <br />
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms. <br />
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].<br />
<br />
==Removing and modifying files==<br />
<br />
*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs):<br><tt>rm -f PDBPARM chem.* rec+sph.phi solvmap tart.txt OUT*</tt><br />
*modify <tt>rec.crg</tt>: <br />
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK. <br />
**remove all TER statements that might have been added. <br />
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN. <br />
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX. <br />
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble. <br />
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively &rArr; do not tart any residues in this file! <br />
<br />
==Running <tt>[[chemgrid]]</tt> ==<br />
<br />
*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues. <br />
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3<sup>rd</sup> and 4<sup>th</sup> column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (&le; -200).<br />
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM<br />
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.<br />
<br />
==Tarting the protein==<br />
<br />
*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.<br />
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.<br />
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting<br />
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using:<br><tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt><br />
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>. <br />
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt> <br />
*optionally tart the residues that are in contact with a crystallographic ligand, if any. <br />
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.<br />
<br />
==Modifying the Delphi spheres==<br />
<br />
*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres). <br />
*delete the spheres that are too close to the solvent. <br />
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues. <br />
*a good number for DelPhi spheres is 120. <br />
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi! <br />
<br />
==Modifying the Matching spheres==<br />
<br />
*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.<br />
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms. <br />
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.<br />
*a good number for matching spheres is 50-60. <br />
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]]. <br />
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres. <br />
*run <tt>cat $mud/header.sph match2.sph</tt> .<br />
<br />
==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==<br />
<br />
*if you changed rec+sph.crg above, you need to run Delphi <br />
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate. <br />
*run <tt>delphi.com > delphi.log</tt> and check the output.<br />
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.<br />
<br />
==Running <tt>[[newsolv.sev]]</tt> ==<br />
<br />
*if you changed rec.crg or the box above, you need to run newsolv.sev <br />
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>newsolv.sev</tt> .<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=ZINC_processing_pipeline&diff=4909ZINC processing pipeline2012-03-15T20:52:04Z<p>Mysinger: Adjust current ring puckering settings</p>
<hr />
<div>Each molecule in ZINC is processed via our ZINC processing pipeline. This process is embodied in a set of scripts that we continue to refine as we discover problems.<br />
<br />
Frankly, we hope people will simply use ZINC rather than trying to reproduce it. Still, in the interests of clarity, transparency, truth, justice and the Canadian Way (TM), here is our current protocol.<br />
<br />
* 1. If you have 2D SDF, convert it to isomeric SMILES.<br />
<br />
* 2. sed -e 's/N=S=N/nsn/g' 2.ism > 2-out.ism <br />
<br />
* 3. Use molinspiration mitools/mib to eliminate broken SMILES:<br />
java -jar /raid1/soft/mitools/mib.jar -singlepart -onlyOrganic -normalizeCharges -f $1 -out smi<br />
<br />
* 4. Use OEChem to remove molecules with problematic functional groups: <br />
filter.py rules.txt 4.ism 4-out.ism > filterlog.txt<br />
see http://blaster.docking.org/filtering/rules_default.txt for current rules.<br />
<br />
* 5. select only 4 of stereochemical expansions from previous step. We just take the first 4, but you can imagine better ways of making the selection.<br />
<br />
* 6. get rid of bogus stereochemistry at nitrogen:<br />
sed -e 's/\[N@\]/N/g' -e 's/\[N@@\]/N/g' -e 's/\[N@H+\]/\[NH+\]/g' -e 's/\[N@@H+\]/\[NH+\]/g' -e 's/\[N@@+\]/\[N+\]/g' -e 's/\[N@+\]/\[N+\]/g' $1 > d.ism<br />
<br />
* 7. If the molecule is already in ZINC, eliminate it from the list.<br />
<br />
* 8. Generate trial 3D structure with corina.<br />
corina -d neu,wh,rc,mc=1,canon -i t=smiles -o t=sdf < 1a.ism > 2.sdf<br />
<br />
* 9. generate reference pH state using Schrodinger's Epik. <br />
epik -ph 7.05 -ms 1 -imae A.mae -omae B.mae -WAIT<br />
<br />
* 10. generate mid, hi and lo pH subsets<br />
mid: setenv EPIK "-ph 7.0 -pht 1 -tp 0.20"<br />
hi: setenv EPIK "-ph 8.5 -pht 0.75 -tp 0.20"<br />
lo: setenv EPIK "-ph 5.5 -pht 0.75 -tp 0.20"<br />
epik $EPIK -imae A.mae -omae B.mae -WAIT<br />
<br />
* 11. For each subset (ref, mid, hi, lo) use Corina to generate 3D model of the relevant protonated state.<br />
corina -d rc,flapn,de=6,mc=4 -i t=mol2 -o t=mol2<br />
<br />
That's really it. There is more to do with loading ZINC, but to generate the models, that is what we think you need to know. Good luck!<br />
<br />
-- John Irwin. March 2009.</div>Mysingerhttp://wiki.docking.org/index.php?title=Preparing_the_protein&diff=4123Preparing the protein2012-01-25T23:39:57Z<p>Mysinger: /* Running <tt>solvmap</tt> */</p>
<hr />
<div>=Preparing the protein=<br />
<br />
Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins. <br />
<br />
==Modifying the PDB file==<br />
<br />
*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements. <br />
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE&curren;) with sulphur (&curren;SD). Be careful about the correct alignment! <br />
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably. <br />
*select the protonation states of HIS residues to be either &delta;- (rename residue to HID), &epsilon;- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.<br />
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS. <br />
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!<br />
<br />
==Running startdockblaster5==<br />
<br />
*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.<br />
*generate the files <tt>.only_spheres</tt> and &ndash; in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> &ndash; <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> . <br />
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand. <br />
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n &mdash; you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch &mdash; it is likely that there are some blanks or hidden characters that are causing the problems. <br />
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms. <br />
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].<br />
<br />
==Removing and modifying files==<br />
<br />
*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs):<br><tt>rm -f PDBPARM chem.* rec+sph.phi solvmap tart.txt OUT*</tt><br />
*modify <tt>rec.crg</tt>: <br />
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK. <br />
**remove all TER statements that might have been added. <br />
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN. <br />
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX. <br />
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble. <br />
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively &rArr; do not tart any residues in this file! <br />
<br />
==Running <tt>[[chemgrid]]</tt> ==<br />
<br />
*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues. <br />
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3<sup>rd</sup> and 4<sup>th</sup> column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (&le; -200).<br />
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM<br />
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.<br />
<br />
==Tarting the protein==<br />
<br />
*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.<br />
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.<br />
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting<br />
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using:<br><tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt><br />
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>. <br />
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt> <br />
*optionally tart the residues that are in contact with a crystallographic ligand, if any. <br />
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.<br />
<br />
==Modifying the Delphi spheres==<br />
<br />
*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres). <br />
*delete the spheres that are too close to the solvent. <br />
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues. <br />
*a good number for DelPhi spheres is 120. <br />
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi! <br />
<br />
==Modifying the Matching spheres==<br />
<br />
*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.<br />
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms. <br />
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.<br />
*a good number for matching spheres is 50-60. <br />
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]]. <br />
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres. <br />
*run <tt>cat $mud/header.sph match2.sph</tt> .<br />
<br />
==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==<br />
<br />
*if you changed rec+sph.crg above, you need to run Delphi <br />
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate. <br />
*run <tt>delphi.com > delphi.log</tt> and check the output.<br />
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.<br />
<br />
==Running <tt>[[solvmap]]</tt> ==<br />
<br />
*if you changed rec.crg or the box above, you need to run solvmap <br />
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>newsolv.sev</tt> .<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=INDOCK_for_DOCK_3.6&diff=3356INDOCK for DOCK 3.62011-10-26T20:45:32Z<p>Mysinger: change file date to match sphere change</p>
<hr />
<div>What follows is a documented sample INDOCK file for [[DOCK 3.6]]. Many lines are required, lines starting with # are comments.<br />
<br />
Required first line:<br />
<br />
DOCK 3.5 parameter<br />
###############################################################################<br />
################## DOCK 3.5 INPUT PARAMETERS 2011/10/26 #######################<br />
###############################################################################<br />
###############################################################################<br />
# INPUT/OUTPUT<br />
#<br />
<br />
This is the path to the receptor matching spheres file. Most scripts make a set of directories and copy the INDOCK file into them, so this path sometimes has an extra set of "../" in it compared to what you might think. If you use [[DOCK Blaster]]. Generally, match3 has more spheres than match2, so produces more possible orientations. These spheres are matched to ligand spheres, generated from heavy atoms in the "rigid component" of each ligand. For more about the rigid component, see [[Flexibase Format]].<br />
<br />
receptor_sphere_file ../../sph/match2.sph<br />
<br />
The next line is always 1, and is marked for deprecation.<br />
<br />
cluster_numbers 1<br />
<br />
The next line refers to which ligand file to use. If using many of the automated scripts, split_database_index is used, as this allows many ligand files (or just 1) to be placed in the split_database_index file and read in one after another during a DOCK run. If docking small things on your own, you can change this to any file.<br />
<br />
# NOTE: split_database_index is reserved to specify a list of files<br />
ligand_atom_file split_database_index<br />
<br />
This will control the file output, again many of the automated scripts expect it to be test. OUTDOCK files are always named OUTDOCK.<br />
<br />
output_file_prefix test.<br />
<br />
This controls the random seed used in the minimization procedure. Changing this will produce slightly different results.<br />
<br />
random_seed 777<br />
#<br />
###############################################################################<br />
# MATCHING<br />
#<br />
<br />
distance_tolerance is how different the distances can be between a pair of receptor matching spheres and a pair of ligand matching spheres for them to still be considered matched.<br />
<br />
distance_tolerance 1.5<br />
<br />
This changes how many spheres must be matched to generate an orientation. 3 as a minimum, 4 as a maximum is generally accepted as the right thing to use. Less than 3 is too degenerate to generate an actual orientation, and requiring more than 4 matched spheres does not work well, since we only use heavy atoms in ring systems to generate ligand matching spheres.<br />
<br />
nodes_maximum 4<br />
nodes_minimum 3<br />
<br />
The next 4 parameters control how the histograms of distance differences are generated. The binsize is how big the bins are, the overlap controls if a sphere can be put into multiple bins. The ligand & receptor parameters are not required to be the same. <br />
<br />
ligand_binsize 0.4<br />
ligand_overlap 0.2<br />
receptor_binsize 0.4<br />
receptor_overlap 0.2<br />
<br />
Bumping is using a quick check of distances when placing ligand atoms in the binding site to determine if they have a steric clash. The maximum is how many can be 'bumped' or in close steric contact per rigid or flexible component of the ligand, as per the [[Flexibase Format]]. Even ligands with some steric clashes can sometimes be rescued by minimization. Setting this number very high will cause many clashed orientations to be scored, which can be prohibitively slow.<br />
<br />
bump_maximum 1<br />
<br />
The next four parameters are unused and unsupported.<br />
<br />
focus_cycles 0<br />
focus_bump 0 <br />
focus_type energy<br />
critical_clusters no<br />
#<br />
###############################################################################<br />
# COLORING<br />
#<br />
<br />
This controls whether chemical matching or coloring is used at all. If yes, many match lines are necessary. These may not be perfect, but [[DOCK Blaster]] has been using these for a long time. Setting this to no produces many more matched orientations, which can be slow, but can help you understand exactly what the energy function is doing.<br />
<br />
chemical_matching yes<br />
case_sensitive no<br />
# ligand color, receptor color<br />
match positive negative<br />
match positive negative_or_acceptor<br />
match positive not_neutral<br />
match negative positive<br />
match negative positive_or_donor<br />
match negative not_neutral<br />
match donor acceptor<br />
match donor donacc<br />
match donor negative_or_acceptor<br />
match donor neutral_or_acceptor_or_donor<br />
match donor not_neutral<br />
match acceptor donor<br />
match acceptor donacc<br />
match acceptor positive_or_donor<br />
match acceptor neutral_or_acceptor_or_donor<br />
match acceptor not_neutral<br />
match neutral neutral<br />
match neutral neutral_or_acceptor_or_donor<br />
match ester_o donor<br />
match ester_o donacc<br />
match ester_o positive_or_donor<br />
match ester_o not_neutral<br />
match amide_o donor<br />
match amide_o donacc<br />
match amide_o positive_or_donor<br />
match amide_o not_neutral<br />
<br />
Single mode is deprecated, these parameters won't work. See [[Dock Ligand Clustering]]<br />
#<br />
###############################################################################<br />
# SINGLE MODE<br />
#<br />
#rmsd_override 0.0<br />
#contact_minimum 0<br />
#energy_maximum 1.0e+6<br />
##truncate_output 1000.0<br />
#<br />
<br />
Search mode is now the default/only mode of docking. Each parameter is described below.<br />
<br />
###############################################################################<br />
# SEARCH MODE<br />
#<br />
<br />
The ratio_minimum parameter has been slated for deprecation.<br />
<br />
ratio_minimum 0.0<br />
<br />
These parameters control how many atoms are necessary in the ligand for it to be docked.<br />
<br />
atom_minimum 5 <br />
atom_maximum 100<br />
<br />
How many of the top molecules will be saved in the output test.* file. <br />
<br />
number_save 50000<br />
<br />
The maximum number of molecules that will be scored in any given run.<br />
<br />
molecules_maximum 300000 <br />
<br />
How many molecules will be skipped, this feature currently does not work.<br />
<br />
initial_skip 0<br />
<br />
How long a molecule is processed before quitting. This feature currently may not work as expected.<br />
<br />
timeout 180<br />
<br />
There are many scoring options:<br />
<br />
# <br />
###############################################################################<br />
# SCORING<br />
#<br />
<br />
Valid options for ligand_desolvation are 'volume' (partial desolvation a la Mysinger & Shoichet 2010), 'full' meaning that the entire ligand is assumed to be desolvated in the binding site and 'none', where no desolvation penalties are applied.<br />
<br />
ligand_desolvation volume<br />
<br />
See the note about relative paths for the matching spheres above, the same comments apply here. There are 2 ways to run 'volume' or partial desolvation, one is to use one grid for every ligand atom like this:<br />
<br />
solvmap_file ../../grids/solvmap_sev<br />
<br />
The other option is to use one grid for ligand heavy atoms and one for ligand hydrogen atoms, you'll want to uncomment these lines to use them (and comment out the other solvmap_file line).<br />
<br />
#solvmap_file ../../grids/solvmap_sev.heavy<br />
#hydrogen_solvmap_file ../../grids/solvmap.sev.hydrogen<br />
<br />
This is the phimap file used for electrostatic scoring. For a better understanding of this grid, see [[Visualizing delphi]]. Sometimes this will change if you are using the new Qnifft Delphi maps, see [[Qnifft DOCK 3.6 conversion]].<br />
<br />
delphi_file ../../grids/rec+sph.phi<br />
<br />
This controls the chemgrid file, which contains the van der Waals scoring for every coordinate (chem.vdw will be called) as well as the distance map grids that will be used for deciphering bumping (chem.bmp will be called).<br />
<br />
chemgrid_file_prefix ../../grids/chem<br />
<br />
This is the parameter file that contains the atom type definitions:<br />
<br />
vdw_parameter_file ../../grids/vdw.parms.amb.mindock<br />
<br />
The following options allow the electrostatics and van der Waals parameters to be scaled relative to each other and the solvation scoring.<br />
<br />
electrostatic_scale 1.0<br />
vdw_scale 1.0<br />
<br />
The following parameter lets ligands with internal steric clashes attempt to find a ligand conformation that scores well but does not have any internal clashes. Sometimes this procedure will fail in circumstances where there are many flexible branches, or where a ligand that is too large for the binding site is being docked.<br />
<br />
check_clashes yes<br />
<br />
If set to yes, this removes the positive solvation from each ligand atom and spreads it evenly over the molecule. This is deprecated because it does unexpected things to solvation, and will be removed entirely soon.<br />
<br />
remove_positive_solvation no<br />
<br />
After each orientation of the rigid component is processed and the many ligand conformations have been examined, the best ligand conformation for that orientation can be minimized using the following parameters.<br />
<br />
#<br />
###############################################################################<br />
# MINIMIZATION<br />
#<br />
<br />
No turns off minimization completely.<br />
<br />
minimize yes<br />
<br />
Don't minimize molecules that score above the minimization_max.<br />
<br />
minimization_max 1.0e15<br />
<br />
If set to yes, this checks to see if the orientation has already been scored and quits. This has not been tested recently.<br />
<br />
check_degeneracy no<br />
<br />
How many iterations of minimization to do. More means longer run times, but potentially better poses.<br />
<br />
simplex_iterations 250<br />
<br />
How much the total energy can changed to be considered converged. Setting this higher will stop faster, setting it lower will cause it to do more iterations before converging (or potentially hitting the iteration max above).<br />
<br />
simplex_convergence 0.1<br />
<br />
If the energy changes by this much, restart the minimizer from this newest position.<br />
<br />
simplex_restart 1.0<br />
<br />
This is the initial distance in angstroms the molecule is translated (note that translation and rotation used to be swapped for many releases of DOCK).<br />
<br />
simplex_initial_translation 0.2<br />
<br />
How many degrees of initial rotation are done.<br />
<br />
simplex_initial_rotation 5.0<br />
#<br />
###############################################################################<br />
###############################################################################</div>Mysingerhttp://wiki.docking.org/index.php?title=INDOCK_for_DOCK_3.6&diff=3355INDOCK for DOCK 3.62011-10-26T20:41:22Z<p>Mysinger: change match3 to match2</p>
<hr />
<div>What follows is a documented sample INDOCK file for [[DOCK 3.6]]. Many lines are required, lines starting with # are comments.<br />
<br />
Required first line:<br />
<br />
DOCK 3.5 parameter<br />
###############################################################################<br />
################## DOCK 3.5 INPUT PARAMETERS 2011/09/07 #######################<br />
###############################################################################<br />
###############################################################################<br />
# INPUT/OUTPUT<br />
#<br />
<br />
This is the path to the receptor matching spheres file. Most scripts make a set of directories and copy the INDOCK file into them, so this path sometimes has an extra set of "../" in it compared to what you might think. If you use [[DOCK Blaster]]. Generally, match3 has more spheres than match2, so produces more possible orientations. These spheres are matched to ligand spheres, generated from heavy atoms in the "rigid component" of each ligand. For more about the rigid component, see [[Flexibase Format]].<br />
<br />
receptor_sphere_file ../../sph/match2.sph<br />
<br />
The next line is always 1, and is marked for deprecation.<br />
<br />
cluster_numbers 1<br />
<br />
The next line refers to which ligand file to use. If using many of the automated scripts, split_database_index is used, as this allows many ligand files (or just 1) to be placed in the split_database_index file and read in one after another during a DOCK run. If docking small things on your own, you can change this to any file.<br />
<br />
# NOTE: split_database_index is reserved to specify a list of files<br />
ligand_atom_file split_database_index<br />
<br />
This will control the file output, again many of the automated scripts expect it to be test. OUTDOCK files are always named OUTDOCK.<br />
<br />
output_file_prefix test.<br />
<br />
This controls the random seed used in the minimization procedure. Changing this will produce slightly different results.<br />
<br />
random_seed 777<br />
#<br />
###############################################################################<br />
# MATCHING<br />
#<br />
<br />
distance_tolerance is how different the distances can be between a pair of receptor matching spheres and a pair of ligand matching spheres for them to still be considered matched.<br />
<br />
distance_tolerance 1.5<br />
<br />
This changes how many spheres must be matched to generate an orientation. 3 as a minimum, 4 as a maximum is generally accepted as the right thing to use. Less than 3 is too degenerate to generate an actual orientation, and requiring more than 4 matched spheres does not work well, since we only use heavy atoms in ring systems to generate ligand matching spheres.<br />
<br />
nodes_maximum 4<br />
nodes_minimum 3<br />
<br />
The next 4 parameters control how the histograms of distance differences are generated. The binsize is how big the bins are, the overlap controls if a sphere can be put into multiple bins. The ligand & receptor parameters are not required to be the same. <br />
<br />
ligand_binsize 0.4<br />
ligand_overlap 0.2<br />
receptor_binsize 0.4<br />
receptor_overlap 0.2<br />
<br />
Bumping is using a quick check of distances when placing ligand atoms in the binding site to determine if they have a steric clash. The maximum is how many can be 'bumped' or in close steric contact per rigid or flexible component of the ligand, as per the [[Flexibase Format]]. Even ligands with some steric clashes can sometimes be rescued by minimization. Setting this number very high will cause many clashed orientations to be scored, which can be prohibitively slow.<br />
<br />
bump_maximum 1<br />
<br />
The next four parameters are unused and unsupported.<br />
<br />
focus_cycles 0<br />
focus_bump 0 <br />
focus_type energy<br />
critical_clusters no<br />
#<br />
###############################################################################<br />
# COLORING<br />
#<br />
<br />
This controls whether chemical matching or coloring is used at all. If yes, many match lines are necessary. These may not be perfect, but [[DOCK Blaster]] has been using these for a long time. Setting this to no produces many more matched orientations, which can be slow, but can help you understand exactly what the energy function is doing.<br />
<br />
chemical_matching yes<br />
case_sensitive no<br />
# ligand color, receptor color<br />
match positive negative<br />
match positive negative_or_acceptor<br />
match positive not_neutral<br />
match negative positive<br />
match negative positive_or_donor<br />
match negative not_neutral<br />
match donor acceptor<br />
match donor donacc<br />
match donor negative_or_acceptor<br />
match donor neutral_or_acceptor_or_donor<br />
match donor not_neutral<br />
match acceptor donor<br />
match acceptor donacc<br />
match acceptor positive_or_donor<br />
match acceptor neutral_or_acceptor_or_donor<br />
match acceptor not_neutral<br />
match neutral neutral<br />
match neutral neutral_or_acceptor_or_donor<br />
match ester_o donor<br />
match ester_o donacc<br />
match ester_o positive_or_donor<br />
match ester_o not_neutral<br />
match amide_o donor<br />
match amide_o donacc<br />
match amide_o positive_or_donor<br />
match amide_o not_neutral<br />
<br />
Single mode is deprecated, these parameters won't work. See [[Dock Ligand Clustering]]<br />
#<br />
###############################################################################<br />
# SINGLE MODE<br />
#<br />
#rmsd_override 0.0<br />
#contact_minimum 0<br />
#energy_maximum 1.0e+6<br />
##truncate_output 1000.0<br />
#<br />
<br />
Search mode is now the default/only mode of docking. Each parameter is described below.<br />
<br />
###############################################################################<br />
# SEARCH MODE<br />
#<br />
<br />
The ratio_minimum parameter has been slated for deprecation.<br />
<br />
ratio_minimum 0.0<br />
<br />
These parameters control how many atoms are necessary in the ligand for it to be docked.<br />
<br />
atom_minimum 5 <br />
atom_maximum 100<br />
<br />
How many of the top molecules will be saved in the output test.* file. <br />
<br />
number_save 50000<br />
<br />
The maximum number of molecules that will be scored in any given run.<br />
<br />
molecules_maximum 300000 <br />
<br />
How many molecules will be skipped, this feature currently does not work.<br />
<br />
initial_skip 0<br />
<br />
How long a molecule is processed before quitting. This feature currently may not work as expected.<br />
<br />
timeout 180<br />
<br />
There are many scoring options:<br />
<br />
# <br />
###############################################################################<br />
# SCORING<br />
#<br />
<br />
Valid options for ligand_desolvation are 'volume' (partial desolvation a la Mysinger & Shoichet 2010), 'full' meaning that the entire ligand is assumed to be desolvated in the binding site and 'none', where no desolvation penalties are applied.<br />
<br />
ligand_desolvation volume<br />
<br />
See the note about relative paths for the matching spheres above, the same comments apply here. There are 2 ways to run 'volume' or partial desolvation, one is to use one grid for every ligand atom like this:<br />
<br />
solvmap_file ../../grids/solvmap_sev<br />
<br />
The other option is to use one grid for ligand heavy atoms and one for ligand hydrogen atoms, you'll want to uncomment these lines to use them (and comment out the other solvmap_file line).<br />
<br />
#solvmap_file ../../grids/solvmap_sev.heavy<br />
#hydrogen_solvmap_file ../../grids/solvmap.sev.hydrogen<br />
<br />
This is the phimap file used for electrostatic scoring. For a better understanding of this grid, see [[Visualizing delphi]]. Sometimes this will change if you are using the new Qnifft Delphi maps, see [[Qnifft DOCK 3.6 conversion]].<br />
<br />
delphi_file ../../grids/rec+sph.phi<br />
<br />
This controls the chemgrid file, which contains the van der Waals scoring for every coordinate (chem.vdw will be called) as well as the distance map grids that will be used for deciphering bumping (chem.bmp will be called).<br />
<br />
chemgrid_file_prefix ../../grids/chem<br />
<br />
This is the parameter file that contains the atom type definitions:<br />
<br />
vdw_parameter_file ../../grids/vdw.parms.amb.mindock<br />
<br />
The following options allow the electrostatics and van der Waals parameters to be scaled relative to each other and the solvation scoring.<br />
<br />
electrostatic_scale 1.0<br />
vdw_scale 1.0<br />
<br />
The following parameter lets ligands with internal steric clashes attempt to find a ligand conformation that scores well but does not have any internal clashes. Sometimes this procedure will fail in circumstances where there are many flexible branches, or where a ligand that is too large for the binding site is being docked.<br />
<br />
check_clashes yes<br />
<br />
If set to yes, this removes the positive solvation from each ligand atom and spreads it evenly over the molecule. This is deprecated because it does unexpected things to solvation, and will be removed entirely soon.<br />
<br />
remove_positive_solvation no<br />
<br />
After each orientation of the rigid component is processed and the many ligand conformations have been examined, the best ligand conformation for that orientation can be minimized using the following parameters.<br />
<br />
#<br />
###############################################################################<br />
# MINIMIZATION<br />
#<br />
<br />
No turns off minimization completely.<br />
<br />
minimize yes<br />
<br />
Don't minimize molecules that score above the minimization_max.<br />
<br />
minimization_max 1.0e15<br />
<br />
If set to yes, this checks to see if the orientation has already been scored and quits. This has not been tested recently.<br />
<br />
check_degeneracy no<br />
<br />
How many iterations of minimization to do. More means longer run times, but potentially better poses.<br />
<br />
simplex_iterations 250<br />
<br />
How much the total energy can changed to be considered converged. Setting this higher will stop faster, setting it lower will cause it to do more iterations before converging (or potentially hitting the iteration max above).<br />
<br />
simplex_convergence 0.1<br />
<br />
If the energy changes by this much, restart the minimizer from this newest position.<br />
<br />
simplex_restart 1.0<br />
<br />
This is the initial distance in angstroms the molecule is translated (note that translation and rotation used to be swapped for many releases of DOCK).<br />
<br />
simplex_initial_translation 0.2<br />
<br />
How many degrees of initial rotation are done.<br />
<br />
simplex_initial_rotation 5.0<br />
#<br />
###############################################################################<br />
###############################################################################</div>Mysingerhttp://wiki.docking.org/index.php?title=INDOCK_for_DOCK_3.6&diff=3353INDOCK for DOCK 3.62011-10-17T01:43:10Z<p>Mysinger: note that remove_positive_solvation will be removed</p>
<hr />
<div>What follows is a documented sample INDOCK file for [[DOCK 3.6]]. Many lines are required, lines starting with # are comments.<br />
<br />
Required first line:<br />
<br />
DOCK 3.5 parameter<br />
###############################################################################<br />
################## DOCK 3.5 INPUT PARAMETERS 2011/09/07 #######################<br />
###############################################################################<br />
###############################################################################<br />
# INPUT/OUTPUT<br />
#<br />
<br />
This is the path to the receptor matching spheres file. Most scripts make a set of directories and copy the INDOCK file into them, so this path sometimes has an extra set of "../" in it compared to what you might think. If you use [[DOCK Blaster]]. Generally, match3 has more spheres than match2, so produces more possible orientations. These spheres are matched to ligand spheres, generated from heavy atoms in the "rigid component" of each ligand. For more about the rigid component, see [[Flexibase Format]].<br />
<br />
receptor_sphere_file ../../sph/match3.sph<br />
<br />
The next line is always 1, and is marked for deprecation.<br />
<br />
cluster_numbers 1<br />
<br />
The next line refers to which ligand file to use. If using many of the automated scripts, split_database_index is used, as this allows many ligand files (or just 1) to be placed in the split_database_index file and read in one after another during a DOCK run. If docking small things on your own, you can change this to any file.<br />
<br />
# NOTE: split_database_index is reserved to specify a list of files<br />
ligand_atom_file split_database_index<br />
<br />
This will control the file output, again many of the automated scripts expect it to be test. OUTDOCK files are always named OUTDOCK.<br />
<br />
output_file_prefix test.<br />
<br />
This controls the random seed used in the minimization procedure. Changing this will produce slightly different results.<br />
<br />
random_seed 777<br />
#<br />
###############################################################################<br />
# MATCHING<br />
#<br />
<br />
distance_tolerance is how different the distances can be between a pair of receptor matching spheres and a pair of ligand matching spheres for them to still be considered matched.<br />
<br />
distance_tolerance 1.5<br />
<br />
This changes how many spheres must be matched to generate an orientation. 3 as a minimum, 4 as a maximum is generally accepted as the right thing to use. Less than 3 is too degenerate to generate an actual orientation, and requiring more than 4 matched spheres does not work well, since we only use heavy atoms in ring systems to generate ligand matching spheres.<br />
<br />
nodes_maximum 4<br />
nodes_minimum 3<br />
<br />
The next 4 parameters control how the histograms of distance differences are generated. The binsize is how big the bins are, the overlap controls if a sphere can be put into multiple bins. The ligand & receptor parameters are not required to be the same. <br />
<br />
ligand_binsize 0.4<br />
ligand_overlap 0.2<br />
receptor_binsize 0.4<br />
receptor_overlap 0.2<br />
<br />
Bumping is using a quick check of distances when placing ligand atoms in the binding site to determine if they have a steric clash. The maximum is how many can be 'bumped' or in close steric contact per rigid or flexible component of the ligand, as per the [[Flexibase Format]]. Even ligands with some steric clashes can sometimes be rescued by minimization. Setting this number very high will cause many clashed orientations to be scored, which can be prohibitively slow.<br />
<br />
bump_maximum 1<br />
<br />
The next four parameters are unused and unsupported.<br />
<br />
focus_cycles 0<br />
focus_bump 0 <br />
focus_type energy<br />
critical_clusters no<br />
#<br />
###############################################################################<br />
# COLORING<br />
#<br />
<br />
This controls whether chemical matching or coloring is used at all. If yes, many match lines are necessary. These may not be perfect, but [[DOCK Blaster]] has been using these for a long time. Setting this to no produces many more matched orientations, which can be slow, but can help you understand exactly what the energy function is doing.<br />
<br />
chemical_matching yes<br />
case_sensitive no<br />
# ligand color, receptor color<br />
match positive negative<br />
match positive negative_or_acceptor<br />
match positive not_neutral<br />
match negative positive<br />
match negative positive_or_donor<br />
match negative not_neutral<br />
match donor acceptor<br />
match donor donacc<br />
match donor negative_or_acceptor<br />
match donor neutral_or_acceptor_or_donor<br />
match donor not_neutral<br />
match acceptor donor<br />
match acceptor donacc<br />
match acceptor positive_or_donor<br />
match acceptor neutral_or_acceptor_or_donor<br />
match acceptor not_neutral<br />
match neutral neutral<br />
match neutral neutral_or_acceptor_or_donor<br />
match ester_o donor<br />
match ester_o donacc<br />
match ester_o positive_or_donor<br />
match ester_o not_neutral<br />
match amide_o donor<br />
match amide_o donacc<br />
match amide_o positive_or_donor<br />
match amide_o not_neutral<br />
<br />
Single mode is deprecated, these parameters won't work. See [[Dock Ligand Clustering]]<br />
#<br />
###############################################################################<br />
# SINGLE MODE<br />
#<br />
#rmsd_override 0.0<br />
#contact_minimum 0<br />
#energy_maximum 1.0e+6<br />
##truncate_output 1000.0<br />
#<br />
<br />
Search mode is now the default/only mode of docking. Each parameter is described below.<br />
<br />
###############################################################################<br />
# SEARCH MODE<br />
#<br />
<br />
The ratio_minimum parameter has been slated for deprecation.<br />
<br />
ratio_minimum 0.0<br />
<br />
These parameters control how many atoms are necessary in the ligand for it to be docked.<br />
<br />
atom_minimum 5 <br />
atom_maximum 100<br />
<br />
How many of the top molecules will be saved in the output test.* file. <br />
<br />
number_save 50000<br />
<br />
The maximum number of molecules that will be scored in any given run.<br />
<br />
molecules_maximum 300000 <br />
<br />
How many molecules will be skipped, this feature currently does not work.<br />
<br />
initial_skip 0<br />
<br />
How long a molecule is processed before quitting. This feature currently may not work as expected.<br />
<br />
timeout 180<br />
<br />
There are many scoring options:<br />
<br />
# <br />
###############################################################################<br />
# SCORING<br />
#<br />
<br />
Valid options for ligand_desolvation are 'volume' (partial desolvation a la Mysinger & Shoichet 2010), 'full' meaning that the entire ligand is assumed to be desolvated in the binding site and 'none', where no desolvation penalties are applied.<br />
<br />
ligand_desolvation volume<br />
<br />
See the note about relative paths for the matching spheres above, the same comments apply here. There are 2 ways to run 'volume' or partial desolvation, one is to use one grid for every ligand atom like this:<br />
<br />
solvmap_file ../../grids/solvmap_sev<br />
<br />
The other option is to use one grid for ligand heavy atoms and one for ligand hydrogen atoms, you'll want to uncomment these lines to use them (and comment out the other solvmap_file line).<br />
<br />
#solvmap_file ../../grids/solvmap_sev.heavy<br />
#hydrogen_solvmap_file ../../grids/solvmap.sev.hydrogen<br />
<br />
This is the phimap file used for electrostatic scoring. For a better understanding of this grid, see [[Visualizing delphi]].<br />
<br />
delphi_file ../../grids/rec+sph.phi<br />
<br />
This controls the chemgrid file, which contains the van der Waals scoring for every coordinate (chem.vdw will be called) as well as the distance map grids that will be used for deciphering bumping (chem.bmp will be called).<br />
<br />
chemgrid_file_prefix ../../grids/chem<br />
<br />
This is the parameter file that contains the atom type definitions:<br />
<br />
vdw_parameter_file ../../grids/vdw.parms.amb.mindock<br />
<br />
The following options allow the electrostatics and van der Waals parameters to be scaled relative to each other and the solvation scoring.<br />
<br />
electrostatic_scale 1.0<br />
vdw_scale 1.0<br />
<br />
The following parameter lets ligands with internal steric clashes attempt to find a ligand conformation that scores well but does not have any internal clashes. Sometimes this procedure will fail in circumstances where there are many flexible branches, or where a ligand that is too large for the binding site is being docked.<br />
<br />
check_clashes yes<br />
<br />
If set to yes, this removes the positive solvation from each ligand atom and spreads it evenly over the molecule. This is deprecated because it does unexpected things to solvation, and will be removed entirely soon.<br />
<br />
remove_positive_solvation no<br />
<br />
After each orientation of the rigid component is processed and the many ligand conformations have been examined, the best ligand conformation for that orientation can be minimized using the following parameters.<br />
<br />
#<br />
###############################################################################<br />
# MINIMIZATION<br />
#<br />
<br />
No turns off minimization completely.<br />
<br />
minimize yes<br />
<br />
Don't minimize molecules that score above the minimization_max.<br />
<br />
minimization_max 1.0e15<br />
<br />
If set to yes, this checks to see if the orientation has already been scored and quits. This has not been tested recently.<br />
<br />
check_degeneracy no<br />
<br />
How many iterations of minimization to do. More means longer run times, but potentially better poses.<br />
<br />
simplex_iterations 250<br />
<br />
How much the total energy can changed to be considered converged. Setting this higher will stop faster, setting it lower will cause it to do more iterations before converging (or potentially hitting the iteration max above).<br />
<br />
simplex_convergence 0.1<br />
<br />
If the energy changes by this much, restart the minimizer from this newest position.<br />
<br />
simplex_restart 1.0<br />
<br />
This is the initial distance in angstroms the molecule is translated (note that translation and rotation used to be swapped for many releases of DOCK).<br />
<br />
simplex_initial_translation 0.2<br />
<br />
How many degrees of initial rotation are done.<br />
<br />
simplex_initial_rotation 5.0<br />
#<br />
###############################################################################<br />
###############################################################################</div>Mysingerhttp://wiki.docking.org/index.php?title=Running_DOCK&diff=4283Running DOCK2011-09-20T23:58:46Z<p>Mysinger: /* Running DOCK */</p>
<hr />
<div>=Running DOCK=<br />
<br />
*modify <tt>$mud/INDOCK</tt> and set up the desired directory structure &ndash; either manually or by running '<tt>md4db.csh bysubset N<sub>1</sub> N<sub>2</sub> Type</tt>', where <tt>N<sub>1</sub></tt> is the identifier of the library (1: lead-like; 2: fragment-like), <tt>N<sub>2</sub></tt> is the number of chunks (i.e., jobs you can run in parallel), and <tt>Type</tt> is the category of library (i.e., bysubset, byvendor, etc).<br />
* if it hasn't been generated by a script, create the file <tt>dirlist</tt>, which conatins the list of the directories (i.e., chunks of the database) that you want to dock.<br />
*if you plan to use any of John's scripts in the downstream processing, leave the output file prefixes at <tt>test.</tt>. <br />
*take care that the paths to the <tt>.db.gz</tt> files in <tt>split_database_index</tt> do not get too long. If they do, go via links. <br />
*submit the calculations to the cluster with <tt>$mud/submit.csh</tt> from the directory in which your data (most importantly, <tt>dirlist</tt>) resides. See [[MUD - Michael's Utilities for Docking]] for setting the $mud variable.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Preparing_the_protein&diff=4122Preparing the protein2011-09-20T23:57:09Z<p>Mysinger: Remove distmap references, add prot2crg.py</p>
<hr />
<div>=Preparing the protein=<br />
<br />
Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins. <br />
<br />
==Modifying the PDB file==<br />
<br />
*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements. <br />
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE&curren;) with sulphur (&curren;SD). Be careful about the correct alignment! <br />
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably. <br />
*select the protonation states of HIS residues to be either &delta;- (rename residue to HID), &epsilon;- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.<br />
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS. <br />
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!<br />
<br />
==Running startdockblaster5==<br />
<br />
*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.<br />
*generate the files <tt>.only_spheres</tt> and &ndash; in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> &ndash; <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> . <br />
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand. <br />
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n &mdash; you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch &mdash; it is likely that there are some blanks or hidden characters that are causing the problems. <br />
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms. <br />
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].<br />
<br />
==Removing and modifying files==<br />
<br />
*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs):<br><tt>rm -f PDBPARM chem.* rec+sph.phi solvmap tart.txt OUT*</tt><br />
*modify <tt>rec.crg</tt>: <br />
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK. <br />
**remove all TER statements that might have been added. <br />
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN. <br />
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX. <br />
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble. <br />
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively &rArr; do not tart any residues in this file! <br />
<br />
==Running <tt>[[chemgrid]]</tt> ==<br />
<br />
*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues. <br />
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3<sup>rd</sup> and 4<sup>th</sup> column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (&le; -200).<br />
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM<br />
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.<br />
<br />
==Tarting the protein==<br />
<br />
*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.<br />
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.<br />
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting<br />
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using:<br><tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt><br />
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>. <br />
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt> <br />
*optionally tart the residues that are in contact with a crystallographic ligand, if any. <br />
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.<br />
<br />
==Modifying the Delphi spheres==<br />
<br />
*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres). <br />
*delete the spheres that are too close to the solvent. <br />
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues. <br />
*a good number for DelPhi spheres is 120. <br />
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi! <br />
<br />
==Modifying the Matching spheres==<br />
<br />
*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.<br />
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms. <br />
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.<br />
*a good number for matching spheres is 50-60. <br />
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]]. <br />
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres. <br />
*run <tt>cat $mud/header.sph match2.sph</tt> .<br />
<br />
==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==<br />
<br />
*if you changed rec+sph.crg above, you need to run Delphi <br />
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate. <br />
*run <tt>delphi.com > delphi.log</tt> and check the output.<br />
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.<br />
<br />
==Running <tt>[[solvmap]]</tt> ==<br />
<br />
*if you changed rec.crg or the box above, you need to run solvmap <br />
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>solvmap</tt> .<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Preparing_the_protein&diff=4121Preparing the protein2011-06-08T06:06:02Z<p>Mysinger: </p>
<hr />
<div>=Preparing the protein=<br />
<br />
Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins. <br />
<br />
==Modifying the PDB file==<br />
<br />
*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', the chain column, all columns to the right of the z-coordinate and the TER statements. <br />
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE&curren;) with sulphur (&curren;SD). Be careful about the correct alignment! <br />
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably. <br />
*select the protonation states of HIS residues to be either &delta;- (rename residue to HID), &epsilon;- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.<br />
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS. <br />
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!<br />
<br />
==Running startdockblaster5==<br />
<br />
*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove the chain column, all columns to the right of the z-coordinate and the TER statements. <br />
*generate the files <tt>.only_spheres</tt> and &ndash; in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> &ndash; <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> . <br />
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand. <br />
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n &mdash; you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to reprepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch &mdash; it is likely that there are some blanks or hidden characters that are causing the problems. <br />
*check the files <tt>stdout</tt> and <tt>stderr</tt> after the run for potential mistakes and error messages. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms. <br />
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].<br />
<br />
==Removing and modifying files==<br />
<br />
*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs):<br><tt>rm -f PDBPARM chem.* distmap.box distmap distmap.log rec+sph.phi solvmap tart.txt OUT*</tt><br />
*modify <tt>rec.crg</tt>: <br />
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK. <br />
**remove all TER statements that might have been added. <br />
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN. <br />
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX. <br />
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble. <br />
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively &rArr; do not tart any residues in this file! <br />
<br />
==Running <tt>[[chemgrid]]</tt> ==<br />
<br />
*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct charges of all residues. <br />
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3<sup>rd</sup> and 4<sup>th</sup> column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (&le; -200).<br />
*Another sign of a problem with atomic radii are any 'WARNING' issued in OUTPARM<br />
*if one has to run <tt>chemgrid</tt> again, remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.<br />
<br />
==Running <tt>distmap</tt> ==<br />
<br />
* the default is to run <tt>distmap</tt> on <tt>rec.crg</tt>. If you modified this file, rerun by simply typing <tt>distmap</tt>.<br />
* AH: cp <tt>rec.crg</tt> to <tt>rec-dist.crg</tt> and remove the Zn atoms in the latter file (otherwise there will be lots of bumping ligands). Edit <tt>INDIST</tt> to update the filename.<br />
*run <tt>distmap</tt><br />
<br />
==Tarting the protein==<br />
<br />
*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.<br />
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.<br />
* take care that the format of the <tt>.prot2</tt> file is consistent with the format in the <tt>amb.crg.oxt</tt> file, e.g., that there is no leading space before an atom name etc.<br />
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt> , where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt> . <br />
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt> <br />
*tart the residues that are in contact with a crystallographic ligand, if any. <br />
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.<br />
<br />
==Modifying the Delphi spheres==<br />
<br />
*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres). <br />
*delete the spheres that are too close to the solvent. <br />
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues. <br />
*a good number for DelPhi spheres is 120. <br />
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi! <br />
<br />
==Modifying the Matching spheres==<br />
<br />
*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.<br />
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms. <br />
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.<br />
*a good number for matching spheres is 50-60. <br />
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]]. <br />
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres. <br />
*run <tt>cat $mud/header.sph match2.sph</tt> .<br />
<br />
==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==<br />
<br />
*if you changed rec+sph.crg above, you need to run Delphi <br />
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate. <br />
*run <tt>delphi.com > delphi.log</tt> and check the output.<br />
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.<br />
<br />
==Running <tt>[[solvmap]]</tt> ==<br />
<br />
*if you changed rec.crg or the box above, you need to run solvmap <br />
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>solvmap</tt> .<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Preparing_the_protein&diff=4120Preparing the protein2011-06-08T06:02:26Z<p>Mysinger: /* Modifying the spheres */</p>
<hr />
<div>=Preparing the protein=<br />
<br />
Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins. <br />
<br />
==Modifying the PDB file==<br />
<br />
*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', the chain column, all columns to the right of the z-coordinate and the TER statements. <br />
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE&curren;) with sulphur (&curren;SD). Be careful about the correct alignment! <br />
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably. <br />
*select the protonation states of HIS residues to be either &delta;- (rename residue to HID), &epsilon;- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.<br />
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS. <br />
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!<br />
<br />
==Running startdockblaster5==<br />
<br />
*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove the chain column, all columns to the right of the z-coordinate and the TER statements. <br />
*generate the files <tt>.only_spheres</tt> and &ndash; in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> &ndash; <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> . <br />
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand. <br />
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n &mdash; you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to reprepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch &mdash; it is likely that there are some blanks or hidden characters that are causing the problems. <br />
*check the files <tt>stdout</tt> and <tt>stderr</tt> after the run for potential mistakes and error messages. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms. <br />
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].<br />
<br />
==Removing and modifying files==<br />
<br />
*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs):<br><tt>rm -f PDBPARM chem.* distmap.box distmap distmap.log rec+sph.phi solvmap tart.txt OUT*</tt><br />
*modify <tt>rec.crg</tt>: <br />
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK. <br />
**remove all TER statements that might have been added. <br />
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN. <br />
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX. <br />
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble. <br />
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively &rArr; do not tart any residues in this file! <br />
<br />
==Running <tt>[[chemgrid]]</tt> ==<br />
<br />
*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct charges of all residues. <br />
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3<sup>rd</sup> and 4<sup>th</sup> column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (&le; -200).<br />
*Another sign of a problem with atomic radii are any 'WARNING' issued in OUTPARM<br />
*if one has to run <tt>chemgrid</tt> again, remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.<br />
<br />
==Running <tt>distmap</tt> ==<br />
<br />
* the default is to run <tt>distmap</tt> on <tt>rec.crg</tt>. If you modified this file, rerun by simply typing <tt>distmap</tt>.<br />
* AH: cp <tt>rec.crg</tt> to <tt>rec-dist.crg</tt> and remove the Zn atoms in the latter file (otherwise there will be lots of bumping ligands). Edit <tt>INDIST</tt> to update the filename.<br />
*run <tt>distmap</tt><br />
<br />
==Tarting the protein==<br />
<br />
*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.<br />
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.<br />
* take care that the format of the <tt>.prot2</tt> file is consistent with the format in the <tt>amb.crg.oxt</tt> file, e.g., that there is no leading space before an atom name etc.<br />
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt> , where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt> . <br />
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt> <br />
*tart the residues that are in contact with a crystallographic ligand, if any. <br />
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.<br />
<br />
==Modifying the Delphi spheres==<br />
<br />
*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres). <br />
*delete the spheres that are too close to the solvent. <br />
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues. <br />
*a good number for DelPhi spheres is 120. <br />
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi! <br />
<br />
==Modifying the Matching spheres==<br />
<br />
*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.<br />
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms. <br />
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.<br />
*a good number for matching spheres is 50-60. <br />
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]]. <br />
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres. <br />
*run <tt>cat $mud/header.sph match2.sph</tt> .<br />
<br />
==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==<br />
<br />
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate. <br />
*run <tt>delphi.com > delphi.log</tt> and check the output.<br />
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.<br />
<br />
==Running <tt>[[solvmap]]</tt> ==<br />
<br />
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>solvmap</tt> . <br />
*after the run, make sure that the file <tt>solvmap</tt> contains '''no''' blank lines.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=103Analysing the results2011-05-28T03:47:18Z<p>Mysinger: /* Atomic contributions to the desolvation */</p>
<hr />
<div>=Some analyses that can be performed=<br />
<br />
See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.<br />
<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need an <tt>.eel1</tt> file to be scored===<br />
<br />
=====For the xtal-lig.mol2 in its crystallographic pose=====<br />
<br />
New way that outputs your.eel1 starting from your.pdb directly<br />
*run '<tt>$mud/to_eel1.csh your.pdb</tt>'. <br />
<br />
If that fails, use the old way to convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=====For molecules that have already been docked=====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Overall molecular score compiled from all scoreopt_so options===<br />
<br />
For default grids<br />
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids'</tt><br />
Or for custom grids, used below to run SEV-based desolvation grids<br />
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids rec+sph.phi chem solvmap_sev'</tt> <br />
The summary for the whole molecule is output to your.eel1.scores in combine.scores format <br />
<br />
===Atomic contributions to the coulombic energy===<br />
<br />
In your.eel1.delphi from the wrapper <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.delphi</tt> .<br />
<br />
===Atomic contributions to the van der Waals energy===<br />
<br />
In your.eel1.vdw from the wrapper <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
<br />
<br />
===Atomic contributions to the desolvation===<br />
<br />
In your.eel1.solv from the wrapper <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> or <tt>grids/solvmap_sev</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> .<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=102Analysing the results2011-05-28T03:46:45Z<p>Mysinger: /* Atomic contributions to the coulombic energy */</p>
<hr />
<div>=Some analyses that can be performed=<br />
<br />
See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.<br />
<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need an <tt>.eel1</tt> file to be scored===<br />
<br />
=====For the xtal-lig.mol2 in its crystallographic pose=====<br />
<br />
New way that outputs your.eel1 starting from your.pdb directly<br />
*run '<tt>$mud/to_eel1.csh your.pdb</tt>'. <br />
<br />
If that fails, use the old way to convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=====For molecules that have already been docked=====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Overall molecular score compiled from all scoreopt_so options===<br />
<br />
For default grids<br />
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids'</tt><br />
Or for custom grids, used below to run SEV-based desolvation grids<br />
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids rec+sph.phi chem solvmap_sev'</tt> <br />
The summary for the whole molecule is output to your.eel1.scores in combine.scores format <br />
<br />
===Atomic contributions to the coulombic energy===<br />
<br />
In your.eel1.delphi from the wrapper <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.delphi</tt> .<br />
<br />
===Atomic contributions to the van der Waals energy===<br />
<br />
In your.eel1.vdw from the wrapper <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
<br />
<br />
===Atomic contributions to the desolvation===<br />
<br />
In your.eel1.solv from the wrapper <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=101Analysing the results2011-05-28T03:46:00Z<p>Mysinger: Update to the modern way to scoreopt</p>
<hr />
<div>=Some analyses that can be performed=<br />
<br />
See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.<br />
<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need an <tt>.eel1</tt> file to be scored===<br />
<br />
=====For the xtal-lig.mol2 in its crystallographic pose=====<br />
<br />
New way that outputs your.eel1 starting from your.pdb directly<br />
*run '<tt>$mud/to_eel1.csh your.pdb</tt>'. <br />
<br />
If that fails, use the old way to convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=====For molecules that have already been docked=====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Overall molecular score compiled from all scoreopt_so options===<br />
<br />
For default grids<br />
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids'</tt><br />
Or for custom grids, used below to run SEV-based desolvation grids<br />
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids rec+sph.phi chem solvmap_sev'</tt> <br />
The summary for the whole molecule is output to your.eel1.scores in combine.scores format <br />
<br />
===Atomic contributions to the coulombic energy===<br />
<br />
In your.eel1.delphi from the wrapper <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
<br />
<br />
===Atomic contributions to the van der Waals energy===<br />
<br />
In your.eel1.vdw from the wrapper <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
<br />
<br />
===Atomic contributions to the desolvation===<br />
<br />
In your.eel1.solv from the wrapper <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
Or to generate this data yourself<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Chembl2pdb&diff=320Chembl2pdb2011-03-18T21:43:09Z<p>Mysinger: /* GENERATION PROCEDURE */</p>
<hr />
<div>== CURRENT DATA ==<br />
<br />
__ Updated 02/24/2011 __<br />
<br />
The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:<br />
<br />
'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''<br />
<br />
There are 3 subfolders:<br />
<br />
- '''uniprot''': categorized by target uniprot id<br />
<br />
- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)<br />
with the corresponding activity data from ChEMBL (actives.smi)<br />
<br />
- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster) <br />
with the corresponding actives from chEMBL(actives.smi)<br />
<br />
In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.<br />
<br />
eg: How many UniProt targets have ChEMBL ligands?<br />
% cd uniprot<br />
% wc -l uniprot<br />
<br />
eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?<br />
% cd bypdb_ligand/<br />
% ls -d ????| wc -l<br />
<br />
eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?<br />
% cd pdb_other/<br />
% ls -d ???? | wc -l<br />
<br />
== GENERATION PROCEDURE ==<br />
<br />
In future, if you want to generate the data again, you need to do the following:<br />
<br />
*Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release) <br />
*Step II.: Make a new directory, run the script pointing to the new sql database name, and wait a day or two for it to finish<br />
mkdir chembl10<br />
cd chembl10<br />
/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10<br />
<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Chembl2pdb&diff=319Chembl2pdb2011-03-18T21:42:29Z<p>Mysinger: /* GENERATION PROCEDURE */</p>
<hr />
<div>== CURRENT DATA ==<br />
<br />
__ Updated 02/24/2011 __<br />
<br />
The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:<br />
<br />
'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''<br />
<br />
There are 3 subfolders:<br />
<br />
- '''uniprot''': categorized by target uniprot id<br />
<br />
- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)<br />
with the corresponding activity data from ChEMBL (actives.smi)<br />
<br />
- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster) <br />
with the corresponding actives from chEMBL(actives.smi)<br />
<br />
In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.<br />
<br />
eg: How many UniProt targets have ChEMBL ligands?<br />
% cd uniprot<br />
% wc -l uniprot<br />
<br />
eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?<br />
% cd bypdb_ligand/<br />
% ls -d ????| wc -l<br />
<br />
eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?<br />
% cd pdb_other/<br />
% ls -d ???? | wc -l<br />
<br />
== GENERATION PROCEDURE ==<br />
<br />
In future, if you want to generate the data again, you need to do the following:<br />
<br />
*Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release) <br />
*Step II.: Make a new directory, run the script, and wait a day or two for it to finish<br />
mkdir chembl10<br />
cd chembl10<br />
/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10<br />
<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Chembl2pdb&diff=318Chembl2pdb2011-03-18T21:41:50Z<p>Mysinger: /* GENERATION PROCEDURE */</p>
<hr />
<div>== CURRENT DATA ==<br />
<br />
__ Updated 02/24/2011 __<br />
<br />
The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:<br />
<br />
'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''<br />
<br />
There are 3 subfolders:<br />
<br />
- '''uniprot''': categorized by target uniprot id<br />
<br />
- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)<br />
with the corresponding activity data from ChEMBL (actives.smi)<br />
<br />
- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster) <br />
with the corresponding actives from chEMBL(actives.smi)<br />
<br />
In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.<br />
<br />
eg: How many UniProt targets have ChEMBL ligands?<br />
% cd uniprot<br />
% wc -l uniprot<br />
<br />
eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?<br />
% cd bypdb_ligand/<br />
% ls -d ????| wc -l<br />
<br />
eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?<br />
% cd pdb_other/<br />
% ls -d ???? | wc -l<br />
<br />
== GENERATION PROCEDURE ==<br />
<br />
In future, if you want to generate the data again, you need to do the following:<br />
<br />
*Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release) <br />
*Step II.: Make a new directory, run the script, and wait a day or two for it to finish<br />
```mkdir chembl10```<br />
```cd chembl10```<br />
```/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10```<br />
<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Chembl2pdb&diff=317Chembl2pdb2011-03-18T21:41:30Z<p>Mysinger: /* GENERATION PROCEDURE */</p>
<hr />
<div>== CURRENT DATA ==<br />
<br />
__ Updated 02/24/2011 __<br />
<br />
The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:<br />
<br />
'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''<br />
<br />
There are 3 subfolders:<br />
<br />
- '''uniprot''': categorized by target uniprot id<br />
<br />
- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)<br />
with the corresponding activity data from ChEMBL (actives.smi)<br />
<br />
- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster) <br />
with the corresponding actives from chEMBL(actives.smi)<br />
<br />
In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.<br />
<br />
eg: How many UniProt targets have ChEMBL ligands?<br />
% cd uniprot<br />
% wc -l uniprot<br />
<br />
eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?<br />
% cd bypdb_ligand/<br />
% ls -d ????| wc -l<br />
<br />
eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?<br />
% cd pdb_other/<br />
% ls -d ???? | wc -l<br />
<br />
== GENERATION PROCEDURE ==<br />
<br />
In future, if you want to generate the data again, you need to do the following:<br />
<br />
Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release) <br />
Step II.: Make a new directory, run the script, and wait a day or two for it to finish<br />
```mkdir chembl10```<br />
```cd chembl10```<br />
```/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10```<br />
<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Chembl2pdb&diff=316Chembl2pdb2011-03-18T21:41:04Z<p>Mysinger: New simplified generation procedure</p>
<hr />
<div>== CURRENT DATA ==<br />
<br />
__ Updated 02/24/2011 __<br />
<br />
The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:<br />
<br />
'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''<br />
<br />
There are 3 subfolders:<br />
<br />
- '''uniprot''': categorized by target uniprot id<br />
<br />
- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)<br />
with the corresponding activity data from ChEMBL (actives.smi)<br />
<br />
- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster) <br />
with the corresponding actives from chEMBL(actives.smi)<br />
<br />
In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.<br />
<br />
eg: How many UniProt targets have ChEMBL ligands?<br />
% cd uniprot<br />
% wc -l uniprot<br />
<br />
eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?<br />
% cd bypdb_ligand/<br />
% ls -d ????| wc -l<br />
<br />
eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?<br />
% cd pdb_other/<br />
% ls -d ???? | wc -l<br />
<br />
== GENERATION PROCEDURE ==<br />
<br />
In future, if you want to generate the data again, you need to do the following:<br />
<br />
Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release) <br />
Step II.: Make a new directory, run the script, and wait a day or two for it to finish<br />
```mkdir chembl10```<br />
```cd chembl10```<br />
```/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10```<br />
<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=MUD_-_Michael%27s_Utilities_for_Docking&diff=3545MUD - Michael's Utilities for Docking2010-01-13T00:39:31Z<p>Mysinger: /* Computing Enrichments */</p>
<hr />
<div>==What's in MUD?==<br />
<br />
*Tools to start, check, and restart dock jobs<br />
*Tools to combine, enrich, plot, and view docking results<br />
<br />
==Setting up MUD==<br />
<br />
*For convenience, point a shell variable to the base mud directory to save typing<br />
set mud=~mysinger/code/mud/trunk<br />
*If you use MUD a lot, you can add this to your ~/.login<br />
*Then simply run commands like this:<br />
$mud/submit.csh<br />
$mud/check.py -h<br />
*Use -h or --help to get full help information for the .py (python) scripts<br />
*The .csh scripts will automatically print usage information if mis-used<br />
*The scripts automatically use their invocation path to find other scripts and libraries they depend on.<br />
<br />
==Job Control==<br />
<br />
===Main Workflow===<br />
<br />
For a quick summary of what to do first see [[SGE_Cluster_Docking]]. For a detailed look at how to get the details right see [[How to run and analyze a DOCK run by hand]].<br />
<br />
*Submit a parallel job to the cluser<br />
$mud/submit.csh<br />
Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.<br />
*Check parallel job status<br />
$mud/check.py<br />
Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.<br />
*Restart all failed subjobs<br />
$mud/restart.py<br />
This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.<br />
<br />
===Specialized Commands===<br />
*Submit job to the local machine<br />
$mud/sublocal.csh<br />
*Submit a single directory to the cluster<br />
qsub $mud/runsge.csh<br />
*Submit a single directory to the local machine<br />
$mud/runsubdir.csh<br />
*Remove docking output leaving only input - will DELETE even completed jobs<br />
$mud/clean.py<br />
*Restart single directory<br />
$mud/restartdir.py<br />
<br />
==Job Analysis==<br />
<br />
*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules. <br />
<br />
To achieve consistency, you have two options:<br />
1. Write coordinates for all molecules (what I use)<br />
In INDOCK, set number_save to 50000 or something high enough to capture all dockable hierarchies. DOCK output is now gzipped so this is cheaper in disk space than it used to be.<br />
2. Do not check for broken molecules<br />
Use the -b option when running combine.py<br />
<br />
===Combining Parallel Jobs===<br />
*Merge all parallel jobs into a single set of unique scores.<br />
$mud/combine.py<br />
This combine carefully accounts for all docked molecules, for more informative enrichment plots.<br />
<br />
*Options:<br />
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.<br />
<br />
*Creates:<br />
#combine.scores - fully processed scores, using the best one for each id<br />
#combine.raw - contains all scores as scrapped from DOCK output<br />
#combine.broken - broken molecules and the reason they failed<br />
#combine.zeroes - important sanity check<br />
<br />
format of combine.scores:<br />
<id> <shape> <elect> <VdW> <polar solv> <apolar solv> <total> <subdir><br />
<br />
The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results. <br />
<br />
===Computing Enrichments===<br />
*Compute enrichment starting from the combined scores.<br />
$mud/enrich.py -s -l LIGAND_FILE<br />
< or ><br />
$mud/enrich.py -l LIGAND_FILE -d DECOY_FILE<br />
Generates both enrichment and roc curves, both for the ligands against all molecules and for the ligands versus just the decoys. It will try to run combine if it has not been run yet, but will do so only with defaults for every option.<br />
<br />
*Input:<br />
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.<br />
<br />
The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip consideration of decoys and thus generation of _own curves. Use -f to force combine to run again.<br />
<br />
<span id="Enrich_Types"></span><br />
*Creates:<br />
#enrich.txt - Enrichment curve for ligands versus all molecules<br />
#roc.txt - ROC curve for ligands versus all molecules<br />
#enrich_own.txt - Enrichment curve for ligands versus only the decoys<br />
#roc_own.txt - ROC curve for ligands versus only the decoys<br />
_own files are not generate is the -s option is used.<br />
<br />
format for output files:<br />
#AUC 50.00 LogAUC 0.00<br />
<x> <y><br />
<x> <y><br />
...<br />
AUC is area under the curve and the random expectation value is 50%. [[LogAUC]] is the area between the log curve and the log random curve, so the random expectation value is 0%. <y> is always "% ligands found", and <x> is either "% database searched" for enrichment plots or "% non-ligands found" for ROC plots.<br />
<br />
===Plotting Enrichments===<br />
Easily plot enrichment and roc curves from one or more jobs.<br />
$mud/plots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC<br />
< or ><br />
$mud/plots.py -i .<br />
Generates plots with one curve for each -i input_directory.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip _own curves, especially if they don't exist because enrich.py was run with -s. You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory. Use -n to get normal instead of semi-log plots (and AUC in place of LogAUC).<br />
<br />
*Creates:<br />
#[title_]enrich.png<br />
#[title_]roc.png<br />
#[title_]enrich_own.png<br />
#[title_]roc_own.png<br />
<br />
The various graphs have the same meaning as their respective cures from [[#Computing Enrichments]]. [title_] is optional and exists when a custom title is given with the -t option.<br />
<br />
===Computing Energy Histograms===<br />
*Compute energy distributions starting from the combined scores.<br />
$mud/energies.py -s -l LIGAND_FILE<br />
< or ><br />
$mud/energies.py -l LIGAND_FILE -d DECOY_FILE<br />
Generates the energy distributions for the ligands, decoys, and all the other molecules.<br />
<br />
*Input:<br />
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.<br />
<br />
The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip consideration of decoys.<br />
<br />
*Creates:<br />
#counts.txt - Energy distributions<br />
<br />
format for output:<br />
number_of_sections number_of_bins min_energy_threshold max_energy_threshold<br />
##### section_name<br />
bin_upper_edge1 count_below_edge1<br />
...<br />
bin_upper_edgeN count_below_edgeN<br />
ABOVE count_above_last_edge<br />
The sections are for ligands, decoys (optional), and others. The bins and counts define the energy histogram. The bins are finely spaced here in order to have more resolution when combine with other runs, whose energy ranges may be different.<br />
<br />
===Plotting Energy Histograms===<br />
Easily plot energy histograms from one or more jobs.<br />
$mud/eplots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC<br />
< or ><br />
$mud/eplots.py -i .<br />
Generates plots with energy distributions for each -i input_directory.<br />
<br />
*Options:<br />
You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory.<br />
<br />
*Creates:<br />
#[title_]counts.png<br />
<br />
===Visualizing Molecule by Molecule Results===<br />
Create a DOCK 4,5,6 type pdb file for use in Chimera's ViewDOCK.<br />
$mud/topdock.py -o topdock.pdb<br />
<br />
*Options:<br />
Use -o to specify an output file besides stdout. Use -t NUMBER to get whatever number of top scoring molecules.<br />
<br />
&rarr; Back to [[Tutorials]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=LogAUC&diff=3479LogAUC2010-01-13T00:37:18Z<p>Mysinger: </p>
<hr />
<div>==What is LogAUC?==<br />
<br />
LogAUC is a metric to evaluate virtual screening performance that has many of the same advantages as area under the curve (AUC), but is based on a plot where the x-axis is semilog in order to focus on early enrichment.<br />
<br />
==Motivation==<br />
<br />
When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number<sup>1</sup>. While AUC can be formulated alternate ways<sup>2,3</sup>, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.<br />
<br />
==Definition==<br />
<br />
Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply <math>logAUC</math>, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:<br />
<br />
<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math><br />
<br />
==Discussion==<br />
<br />
From similar reasoning based on semilog ROC plots, Clark and Webster-Clark construct the pROC AUC metric<sup>2</sup>, which is similar to the numerator of logAUC except that the integration is done over horizontal bars instead of vertical trapezoids. The advantage of constructing logAUC as a fraction over the ideal area is that the choice of base for the logarithm is irrelevant, because changing base simply results in a constant that cancels between numerator and denominator. Also, by explicitly defining the area of interest using λ and integrating vertically, we are able to avoid the singularity at <math>x_i=0</math> encountered in pROC. More importantly, the fixed integration area means we can more directly compare <math>logAUC_\lambda</math> values across databases of different sizes and across targets with different ratios of actives to inactives. The final advantage of logAUC is that if you are used to looking at semilog ROC plots plotted from λ to 1, and understand that logAUC is just the percentage of the total area below the curve, then you can at some point gain the same intuitive feel as AUC has for linear ROC plots. In a semilog ROC plot the random line occupies only a sliver of the total area, and indeed its logAUC is just 14.462%. In order to more easily compare a given logAUC to this random value, we instead report the “adjusted logAUC” as the calculated value minus 14.462%, so that positive values mean overall enrichments better than random. <br />
<br />
<math>Adjusted~LogAUC=LogAUC_{0.001}-0.14462</math><br />
<br />
==References==<br />
## Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55.<br />
## Clark, R. D.; Webster-Clark, D. J., Managing bias in ROC curves. J Comput Aided Mol Des 2008, 22, (3-4), 141-6.<br />
## Truchon, J. F.; Bayly, C. I., Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model 2007, 47, (2), 488-508.<br />
<br />
==Citation==<br />
Michael Mysinger, Brian Shoichet. "Rapid Context-Dependent Ligand Desolvation in Molecular Docking". 2010. (in preparation for J Chem Inf Model)</div>Mysingerhttp://wiki.docking.org/index.php?title=LogAUC&diff=3478LogAUC2010-01-13T00:32:03Z<p>Mysinger: </p>
<hr />
<div>==What is LogAUC?==<br />
<br />
LogAUC is a metric to evaluate virtual screening performance that has many of the same advantages as area under the curve (AUC), but is based on a plot where the x-axis is semilog in order to focus on early enrichment.<br />
<br />
==Motivation==<br />
<br />
When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number<sup>1</sup>. While AUC can be formulated alternate ways<sup>2,3</sup>, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.<br />
<br />
==Definition==<br />
<br />
Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply <math>logAUC</math>, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:<br />
<br />
<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math><br />
<br />
==Discussion==<br />
<br />
From similar reasoning based on semilog ROC plots, Clark and Webster-Clark construct the pROC AUC metric<sub>2</sub>, which is similar to the numerator of logAUC except that the integration is done over horizontal bars instead of vertical trapezoids. The advantage of constructing logAUC as a fraction over the ideal area is that the choice of base for the logarithm is irrelevant, because changing base simply results in a constant that cancels between numerator and denominator. Also, by explicitly defining the area of interest using λ and integrating vertically, we are able to avoid the singularity at <math>x_i=0</math> encountered in pROC. More importantly, the fixed integration area means we can more directly compare <math>logAUC_\lambda</math> values across databases of different sizes and across targets with different ratios of actives to inactives. The final advantage of logAUC is that if you are used to looking at semilog ROC plots plotted from λ to 1, and understand that logAUC is just the percentage of the total area below the curve, then you can at some point gain the same intuitive feel as AUC has for linear ROC plots. In a semilog ROC plot the random line occupies only a sliver of the total area, and indeed its logAUC is just 14.462%. In order to more easily compare a given logAUC to this random value, we instead report the “adjusted logAUC” as the calculated value minus 14.462%, so that positive values mean overall enrichments better than random. <br />
<br />
<math>Adjusted~LogAUC=LogAUC_{0.001}-.14462</math><br />
<br />
==References==<br />
## Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55.<br />
## Clark, R. D.; Webster-Clark, D. J., Managing bias in ROC curves. J Comput Aided Mol Des 2008, 22, (3-4), 141-6.<br />
## Truchon, J. F.; Bayly, C. I., Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model 2007, 47, (2), 488-508.</div>Mysingerhttp://wiki.docking.org/index.php?title=LogAUC&diff=3477LogAUC2010-01-13T00:15:46Z<p>Mysinger: </p>
<hr />
<div>==What is LogAUC?==<br />
<br />
LogAUC is a metric to evaluate virtual screening performance that has some nice characteristics. It is intuitive to use <br />
<br />
==Motivation==<br />
<br />
When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number<sup>1</sup>. While AUC can be formulated alternate ways<sup>2,3</sup>, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.<br />
<br />
==Definition==<br />
<br />
Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply logAUC, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:<br />
<br />
<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math><br />
<br />
==References==<br />
1. Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55.<br />
2.</div>Mysingerhttp://wiki.docking.org/index.php?title=LogAUC&diff=3476LogAUC2010-01-13T00:00:40Z<p>Mysinger: </p>
<hr />
<div>==What is LogAUC?==<br />
<br />
LogAUC is a metric to evaluate virtual screening performance that has some nice characteristics. It is intuitive to use <br />
<br />
==Motivation==<br />
<br />
When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number<sup>1</sup>. While ROC AUC can be formulated alternate ways, it can be <br />
<br />
<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math><br />
<br />
==References==</div>Mysingerhttp://wiki.docking.org/index.php?title=LogAUC&diff=3475LogAUC2010-01-12T23:41:33Z<p>Mysinger: </p>
<hr />
<div><br />
==What is LogAUC?==<br />
<br />
LogAUC is a metric to evaluate virtual screening performance that has some nice characteristics. It is intuitive to use <br />
<br />
==Motivation==<br />
<br />
When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number.</div>Mysingerhttp://wiki.docking.org/index.php?title=How_To_Guides&diff=3191How To Guides2010-01-12T22:56:00Z<p>Mysinger: </p>
<hr />
<div>What do you want to do? <br />
<br />
{{TOCright}}<br />
<br />
= Tutorials =<br />
<br />
* [[Tutorials]]<br />
<br />
= Protocols =<br />
<br />
* [[Automated Database Preparation]] - unix protocol<br />
* [[LogAUC]] - metric to measure virtual screening performance<br />
* [[DOCK Blaster:Protocols]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Running_DOCK&diff=4282Running DOCK2009-12-05T02:56:41Z<p>Mysinger: /* Running DOCK */</p>
<hr />
<div>=Running DOCK=<br />
<br />
*modify <tt>INDOCK</tt> and set up the desired directory structure &ndash; either manually or by running '<tt>md4db.csh bysubset N<sub>1</sub> N<sub>2</sub> Type</tt>', where <tt>N<sub>1</sub></tt> is the identifier of the library (1: lead-like; 2: fragment-like), <tt>N<sub>2</sub></tt> is the number of chunks (i.e., jobs you can run in parallel), and <tt>Type</tt> is the category of library (i.e., bysubset, byvendor, etc).<br />
* if it hasn't been generated by a script, create the file <tt>dirlist</tt>, which conatins the list of the directories (i.e., chunks of the database) that you want to dock.<br />
*if you plan to use any of John's scripts in the downstream processing, leave the output file prefixes at <tt>test.</tt>. <br />
*take care that the paths to the <tt>.db.gz</tt> files in <tt>split_database_index</tt> do not get too long. If they do, go via links. <br />
*submit the calculations to the cluster with <tt>$mud/submit.csh</tt> from the directory in which your data (most importantly, <tt>dirlist</tt>) resides. See [[MUD - Michael's Utilities for Docking]] for setting the $mud variable.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Running_DOCK&diff=4281Running DOCK2009-12-05T02:56:27Z<p>Mysinger: /* Running DOCK */</p>
<hr />
<div>=Running DOCK=<br />
<br />
*modify <tt>INDOCK</tt> and set up the desired directory structure &ndash; either manually or by running <tt>md4db.csh bysubset N<sub>1</sub> N<sub>2</sub> Type</tt>, where <tt>N<sub>1</sub></tt> is the identifier of the library (1: lead-like; 2: fragment-like), <tt>N<sub>2</sub></tt> is the number of chunks (i.e., jobs you can run in parallel), and <tt>Type</tt> is the category of library (i.e., bysubset, byvendor, etc).<br />
* if it hasn't been generated by a script, create the file <tt>dirlist</tt>, which conatins the list of the directories (i.e., chunks of the database) that you want to dock.<br />
*if you plan to use any of John's scripts in the downstream processing, leave the output file prefixes at <tt>test.</tt>. <br />
*take care that the paths to the <tt>.db.gz</tt> files in <tt>split_database_index</tt> do not get too long. If they do, go via links. <br />
*submit the calculations to the cluster with <tt>$mud/submit.csh</tt> from the directory in which your data (most importantly, <tt>dirlist</tt>) resides. See [[MUD - Michael's Utilities for Docking]] for setting the $mud variable.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Running_DOCK&diff=4280Running DOCK2009-12-05T02:55:58Z<p>Mysinger: </p>
<hr />
<div>=Running DOCK=<br />
<br />
*modify <tt>INDOCK</tt> and set up the desired directory structure &ndash; either manually or by running <tt>mksdir3.csh N<sub>1</sub> N<sub>2</sub> Type</tt>, where <tt>N<sub>1</sub></tt> is the identifier of the library (1: lead-like; 2: fragment-like), <tt>N<sub>2</sub></tt> is the number of chunks (i.e., jobs you can run in parallel), and <tt>Type</tt> is the category of library (i.e., bysubset, byvendor, etc).<br />
* if it hasn't been generated by a script, create the file <tt>dirlist</tt>, which conatins the list of the directories (i.e., chunks of the database) that you want to dock.<br />
*if you plan to use any of John's scripts in the downstream processing, leave the output file prefixes at <tt>test.</tt>. <br />
*take care that the paths to the <tt>.db.gz</tt> files in <tt>split_database_index</tt> do not get too long. If they do, go via links. <br />
*submit the calculations to the cluster with <tt>$mud/submit.csh</tt> from the directory in which your data (most importantly, <tt>dirlist</tt>) resides. See [[MUD - Michael's Utilities for Docking]] for setting the $mud variable.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Preparing_the_ligand&diff=4089Preparing the ligand2009-12-05T02:55:07Z<p>Mysinger: </p>
<hr />
<div>=Preparing a ligand=<br />
<br />
==Automatic way, starting from SMILES==<br />
<br />
This way will make use of John's automatic scripts for database<br />
preparation and actually upload new molecules to a special section of<br />
[http://zinc.docking.org/ ZINC].<br />
<br />
*it is advisable to create a special subdirectory, since many new files will be generated. <br />
*the file containing the [http://www.daylight.com/smiles/ SMILES] strings should contain a string followed by an identifier on each line. <br />
*OPTIONAL: run <tt>convert.py --i=yourname.smi --o=yourname.ism</tt> . This will convert your SMILES to ''isomeric'' SMILES.<br />
*run <tt>dbgen.csh yourname.smi</tt>. <br />
*you should obtain a file <tt>somename.db.gz</tt> .<br />
<br />
==Manual way==<br />
<br />
===Isolating the ligand as <tt>.mol2</tt> file===<br />
<br />
*extract the ligand structure from the <tt>.pdb</tt> file. <br />
*assign hydrogens. <br />
*assign all atom ([http://www.tripos.com/mol2/atom_types.html Sybyl/TAFF]) and bond types. <br />
*save it as <tt>ligandname.mol2</tt> file. <br />
<br />
===Running <tt>omega</tt> ===<br />
<br />
*run [http://www.eyesopen.com/products/applications/omega.html OMEGA], but don't ask me how to do that yet.<br />
<br />
===Running amsol===<br />
<br />
*find more information about amsol [http://comp.chem.umn.edu/amsol/ on its homepage]. <br />
*<tt>mkdir ./amsol2</tt> <br />
*Use file2file.py to get the right formal charge to feed to AMSOL. It is also important to change the name, otherwise the original <tt>.mol2</tt> file will be overwritten!<br />
<tt>file2file.py -g ligandname.mol2 ./amsol2/someothername.mol2</tt> <br />
*edit <tt>./amsol2/someothername.mol2</tt> : <br />
*<br />
*delete all lines prior to <tt>@<TRIPOS>MOLECULE</tt> <br />
*<br />
*change line 2 (molecule name) to something of the format <tt>ABCD12345678</tt> (four capital letters followed by eight numbers). <br />
*<br />
*line 3 should be <tt>n<sub>atoms</sub> n<sub>bonds</sub> 0 0 0</tt><br />
*<br />
*the <tt>@<TRIPOS>MOLECULE</tt> section must consist of exactly '''5''' lines (adjust by adding/deleting blanks). <br />
*<br />
*remove all sections after the <tt>@<TRIPOS>BOND</tt> section.<br />
*<br />
*delete the blank lines between the <tt>ATOM</tt> and <tt>BOND</tt> sections, if there are any. <br />
*run <tt>RunAMSOL3.csh WAIT</tt> <br />
*the output <tt>someothername.solv</tt> file will contain the following:<br />
{| style="text-align: center; border:1px solid #aaa; margin: 1em 1em 1em 0; background: #f9f9f9; border-collapse: collapse;" cellpadding="5" cellspacing="0" <br />
|+ '''AMSOL output'''<br />
|-<br />
! style="border:1px #aaa solid; padding: 0.2em;" | line #1<br />
| style="border:1px #aaa solid; padding: 0.2em;" | molname || style="border:1px #aaa solid; padding: 0.2em;" | <math>n_{atoms}</math> || style="border:1px #aaa solid; padding: 0.2em;" | charge || style="border:1px #aaa solid; padding: 0.2em;" | pol_solv || style="border:1px #aaa solid; padding: 0.2em;" | ? || style="border:1px #aaa solid; padding: 0.2em;" | apol_solv || style="border:1px #aaa solid; padding: 0.2em;" | total_solv<br />
|-<br />
! style="border:1px #aaa solid; padding: 0.2em;" | other lines <br />
| style="border:1px #aaa solid; padding: 0.2em;" | charge || style="border:1px #aaa solid; padding: 0.2em;" | pol_solv || style="border:1px #aaa solid; padding: 0.2em;" | ? || style="border:1px #aaa solid; padding: 0.2em;" | apol_solv || style="border:1px #aaa solid; padding: 0.2em;" | total_solv<br />
|-<br />
| style="border:1px #aaa solid; padding: 0.2em;" | ''(per_atom)''<br />
|}<br />
<br />
<br />
*furthermore, there will be <tt>someothername.nmol2</tt> file which contains the correct partial charges.<br />
<br />
===Running <tt>mol2db</tt> ===<br />
<br />
*edit <tt>someothername.nmol2</tt> so that the <tt>@<TRIPOS>MOLECULE</tt> section consists of exactly '''6''' lines. <br />
*edit the <tt>inhier</tt> file so that the 'mol2_file', 'db_file' and 'solvation_table' entries are correct. <br />
*run <tt>mol2db inhier</tt> <br />
*add the preamble at the top of the file. <br />
*<tt>gzip</tt> the resulting file so that it can be used by <tt>DOCK</tt> .<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=100Analysing the results2009-12-05T02:54:53Z<p>Mysinger: </p>
<hr />
<div>=Some analyses that can be performed=<br />
<br />
See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.<br />
<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need an <tt>.eel1</tt> file to be scored===<br />
<br />
=====For the xtal-lig.mol2 in its crystallographic pose=====<br />
<br />
Convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=====For molecules that have already been docked=====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Preparing_the_protein&diff=4117Preparing the protein2009-12-05T02:54:32Z<p>Mysinger: </p>
<hr />
<div>=Preparing the protein=<br />
<br />
Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins. <br />
<br />
==Modifying the PDB file==<br />
<br />
*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', the chain column, all columns to the right of the z-coordinate and the TER statements. <br />
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE&curren;) with sulphur (&curren;SD). Be careful about the correct alignment! <br />
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably. <br />
*select the protonation states of HIS residues to be either &delta;- (rename residue to HID), &epsilon;- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.<br />
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS. <br />
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!<br />
<br />
==Running startdockblaster5==<br />
<br />
*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove the chain column, all columns to the right of the z-coordinate and the TER statements. <br />
*generate the files <tt>.only_spheres</tt> and &ndash; in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> &ndash; <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> . <br />
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand. <br />
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n &mdash; you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to reprepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch &mdash; it is likely that there are some blanks or hidden characters that are causing the problems. <br />
*check the files <tt>stdout</tt> and <tt>stderr</tt> after the run for potential mistakes and error messages. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms. <br />
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].<br />
<br />
==Removing and modifying files==<br />
<br />
*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs):<br><tt>rm -f PDBPARM chem.* distmap.box distmap distmap.log rec+sph.phi solvmap tart.txt OUT*</tt><br />
*modify <tt>rec.crg</tt>: <br />
*<br />
*AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK. <br />
*<br />
*remove all TER statements that might have been added. <br />
*<br />
*AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN. <br />
*<br />
*take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX. <br />
*<br />
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble. <br />
*<br />
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively &rArr; do not tart any residues in this file! <br />
<br />
==Running <tt>[[chemgrid]]</tt> ==<br />
<br />
*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct charges of all residues. <br />
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3<sup>rd</sup> and 4<sup>th</sup> column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (&le; -200).<br />
*Another sign of a problem with atomic radii are any 'WARNING' issued in OUTPARM<br />
*if one has to run <tt>chemgrid</tt> again, remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.<br />
<br />
==Running <tt>distmap</tt> ==<br />
<br />
* the default is to run <tt>distmap</tt> on <tt>rec.crg</tt>. If you modified this file, rerun by simply typing <tt>distmap</tt>.<br />
* AH: cp <tt>rec.crg</tt> to <tt>rec-dist.crg</tt> and remove the Zn atoms in the latter file (otherwise there will be lots of bumping ligands). Edit <tt>INDIST</tt> to update the filename.<br />
*run <tt>distmap</tt><br />
<br />
==Tarting the protein==<br />
<br />
*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.<br />
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.<br />
* take care that the format of the <tt>.prot2</tt> file is consistent with the format in the <tt>amb.crg.oxt</tt> file, e.g., that there is no leading space before an atom name etc.<br />
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt> , where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt> . <br />
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt> <br />
*tart the residues that are in contact with a crystallographic ligand, if any. <br />
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.<br />
<br />
==Modifying the spheres==<br />
<br />
*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres). <br />
*delete the spheres that are too close to the solvent. <br />
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues. <br />
*a good number for DelPhi spheres is 120. <br />
*append the spheres to the end of <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi! <br />
*do the same for <tt>match2.sph.pdb</tt> (i.e., the matching spheres); put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there. If you selected <tt>.useligsph</tt> . be careful not to move any spheres based on the ligand atoms. <br />
*a good number for matching spheres is 40. <br />
*run <tt>pdb_to_spheres.py matchN.sph.pdb matchN.sph</tt> to generate the files that will be read by DelPhi/[[DOCK]]. <br />
*if desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>grids</tt> ) to put some color on your spheres. <br />
*in any case, put the preamble ("DOCK 5.2 ligand_atoms...") into <tt>match2.sph</tt> . <br />
<br />
==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==<br />
<br />
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate. <br />
*run <tt>delphi.com > delphi.log</tt> and check the output.<br />
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.<br />
<br />
==Running <tt>[[solvmap]]</tt> ==<br />
<br />
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>solvmap</tt> . <br />
*after the run, make sure that the file <tt>solvmap</tt> contains '''no''' blank lines.<br />
<br />
[[Category:Manual_DOCK]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=99Analysing the results2009-12-05T02:53:11Z<p>Mysinger: /* Getting individual atom contributions with scoreopt_so */</p>
<hr />
<div>=Some analyses that can be performed=<br />
<br />
See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.<br />
<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need an <tt>.eel1</tt> file to be scored===<br />
<br />
=====For the xtal-lig.mol2 in its crystallographic pose=====<br />
<br />
Convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=====For molecules that have already been docked=====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=98Analysing the results2009-12-05T02:52:29Z<p>Mysinger: /* Some analyses that can be performed */</p>
<hr />
<div>=Some analyses that can be performed=<br />
<br />
See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.<br />
<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need and <tt>.eel1</tt> file to be scored===<br />
<br />
=====For the xtal-lig.mol2 in its crystallographic pose=====<br />
<br />
Convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=====For molecules that have already been docked=====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=97Analysing the results2009-12-05T02:50:23Z<p>Mysinger: /* Getting individual atom contributions with scoreopt_so */</p>
<hr />
<div>=Some analyses that can be performed=<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need and <tt>.eel1</tt> file to be scored===<br />
<br />
=====For the xtal-lig.mol2 in its crystallographic pose=====<br />
<br />
Convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=====For molecules that have already been docked=====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=96Analysing the results2009-12-05T02:50:01Z<p>Mysinger: </p>
<hr />
<div>=Some analyses that can be performed=<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need and <tt>.eel1</tt> file to be scored===<br />
<br />
====For the xtal-lig.mol2 in its crystallographic pose====<br />
<br />
Convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
====For molecules that have already been docked====<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=95Analysing the results2009-12-05T02:49:07Z<p>Mysinger: /* Getting individual atom contributions with scoreopt_so */</p>
<hr />
<div>=Some analyses that can be performed=<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===First you need and <tt>.eel1</tt> file to be scored===<br />
<br />
=For the xtal-lig.mol2 in its crystallographic pose=<br />
<br />
Convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.<br />
<br />
=For molecules that have already been docked=<br />
<br />
*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.<br />
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'<br />
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>. <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=94Analysing the results2009-12-05T02:39:01Z<p>Mysinger: /* Obtaining the net charge of a docked molecule */</p>
<hr />
<div>=Some analyses that can be performed=<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===Converting a <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file===<br />
<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run <tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=93Analysing the results2009-12-05T02:37:08Z<p>Mysinger: /* Combining the results of all subdirectories */</p>
<hr />
<div>=Some analyses that can be performed=<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===Converting a <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file===<br />
<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run <tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file. This script is called by <tt>combine10.csh</tt> and the output is called <tt>FF.new.chg</tt> (cf. section [[#Combining the results of all subdirectories|5.1]]).<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=92Analysing the results2009-12-05T02:36:36Z<p>Mysinger: /* Combining the results of all subdirectories */</p>
<hr />
<div>=Some analyses that can be performed=<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using <tt>$mud/topdock.py -o top500.pdb</tt>, which you can read into ViewDOCK in chimera as a DOCK 4,5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===Converting a <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file===<br />
<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run <tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file. This script is called by <tt>combine10.csh</tt> and the output is called <tt>FF.new.chg</tt> (cf. section [[#Combining the results of all subdirectories|5.1]]).<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Analysing_the_results&diff=91Analysing the results2009-12-05T02:36:05Z<p>Mysinger: /* Combining the results of all subdirectories */</p>
<hr />
<div>=Some analyses that can be performed=<br />
==Combining the results of all subdirectories==<br />
<br />
*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using <tt>$mud/topdock.py -o top500.pdb</tt>, which you can read into ViewDOCK in chimera as a DOCK 4,5, or 6 style file.<br />
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <<tt>>$mud/topdock.py -e<</tt>>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.<br />
<br />
==Getting individual atom contributions with scoreopt_so==<br />
<br />
===Converting a <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file===<br />
<br />
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.<br />
*run <tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>.<br />
<br />
===Individual contributions to the coulombic energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu. <br />
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.elec</tt> . <br />
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 &times; 10) of the atom, respectively. <br />
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.<br />
<br />
===Individual contributions to the van der Waals energy===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu. <br />
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> . <br />
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> . <br />
*answer the question about interpolation with 'yes'. <br />
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000. <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> . <br />
*be adequately [http://www.merriam-webster.com/dictionary/scared scared]. <br />
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.<br />
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.<br />
<br />
===Individual contributions to the desolvation===<br />
<br />
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu. <br />
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> . <br />
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> . <br />
*enter the name of the output file, e.g. <tt>ligand.solv</tt> . <br />
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 &times; 10) of the atom, respectively.<br />
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.<br />
<br />
==Other small useful things==<br />
===Obtaining the net charge of a docked molecule===<br />
<br />
*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file. This script is called by <tt>combine10.csh</tt> and the output is called <tt>FF.new.chg</tt> (cf. section [[#Combining the results of all subdirectories|5.1]]).<br />
<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=Running_DOCK&diff=4279Running DOCK2009-12-05T02:28:43Z<p>Mysinger: /* Running DOCK */</p>
<hr />
<div>=Running DOCK=<br />
<br />
*modify <tt>INDOCK</tt> and set up the desired directory structure &ndash; either manually or by running <tt>mksdir3.csh N<sub>1</sub> N<sub>2</sub> Type</tt>, where <tt>N<sub>1</sub></tt> is the identifier of the library (1: lead-like; 2: fragment-like), <tt>N<sub>2</sub></tt> is the number of chunks (i.e., jobs you can run in parallel), and <tt>Type</tt> is the category of library (i.e., bysubset, byvendor, etc).<br />
* if it hasn't been generated by a script, create the file <tt>dirlist</tt>, which conatins the list of the directories (i.e., chunks of the database) that you want to dock.<br />
*if you plan to use any of John's scripts in the downstream processing, leave the output file prefixes at <tt>test.</tt>. <br />
*take care that the paths to the <tt>.db.gz</tt> files in <tt>split_database_index</tt> do not get too long. If they do, go via links. <br />
*submit the calculations to the cluster with <tt>$mud/submit.csh</tt> from the directory in which your data (most importantly, <tt>dirlist</tt>) resides. See [[MUD - Michael's Utilities for Docking]] for setting the $mud variable.<br />
[[Category:Manual_DOCK]]</div>Mysingerhttp://wiki.docking.org/index.php?title=MUD_-_Michael%27s_Utilities_for_Docking&diff=3544MUD - Michael's Utilities for Docking2009-12-05T01:59:28Z<p>Mysinger: </p>
<hr />
<div>==What's in MUD?==<br />
<br />
*Tools to start, check, and restart dock jobs<br />
*Tools to combine, enrich, plot, and view docking results<br />
<br />
==Setting up MUD==<br />
<br />
*For convenience, point a shell variable to the base mud directory to save typing<br />
set mud=~mysinger/code/mud/trunk<br />
*If you use MUD a lot, you can add this to your ~/.login<br />
*Then simply run commands like this:<br />
$mud/submit.csh<br />
$mud/check.py -h<br />
*Use -h or --help to get full help information for the .py (python) scripts<br />
*The .csh scripts will automatically print usage information if mis-used<br />
*The scripts automatically use their invocation path to find other scripts and libraries they depend on.<br />
<br />
==Job Control==<br />
<br />
===Main Workflow===<br />
<br />
For a quick summary of what to do first see [[SGE_Cluster_Docking]]. For a detailed look at how to get the details right see [[How to run and analyze a DOCK run by hand]].<br />
<br />
*Submit a parallel job to the cluser<br />
$mud/submit.csh<br />
Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.<br />
*Check parallel job status<br />
$mud/check.py<br />
Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.<br />
*Restart all failed subjobs<br />
$mud/restart.py<br />
This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.<br />
<br />
===Specialized Commands===<br />
*Submit job to the local machine<br />
$mud/sublocal.csh<br />
*Submit a single directory to the cluster<br />
qsub $mud/runsge.csh<br />
*Submit a single directory to the local machine<br />
$mud/runsubdir.csh<br />
*Remove docking output leaving only input - will DELETE even completed jobs<br />
$mud/clean.py<br />
*Restart single directory<br />
$mud/restartdir.py<br />
<br />
==Job Analysis==<br />
<br />
*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules. <br />
<br />
To achieve consistency, you have two options:<br />
1. Write coordinates for all molecules (what I use)<br />
In INDOCK, set number_save to 50000 or something high enough to capture all dockable hierarchies. DOCK output is now gzipped so this is cheaper in disk space than it used to be.<br />
2. Do not check for broken molecules<br />
Use the -b option when running combine.py<br />
<br />
===Combining Parallel Jobs===<br />
*Merge all parallel jobs into a single set of unique scores.<br />
$mud/combine.py<br />
This combine carefully accounts for all docked molecules, for more informative enrichment plots.<br />
<br />
*Options:<br />
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.<br />
<br />
*Creates:<br />
#combine.scores - fully processed scores, using the best one for each id<br />
#combine.raw - contains all scores as scrapped from DOCK output<br />
#combine.broken - broken molecules and the reason they failed<br />
#combine.zeroes - important sanity check<br />
<br />
format of combine.scores:<br />
<id> <shape> <elect> <VdW> <polar solv> <apolar solv> <total> <subdir><br />
<br />
The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results. <br />
<br />
===Computing Enrichments===<br />
*Compute enrichment starting from the combined scores.<br />
$mud/enrich.py -s -l LIGAND_FILE<br />
< or ><br />
$mud/enrich.py -l LIGAND_FILE -d DECOY_FILE<br />
Generates both enrichment and roc curves, both for the ligands against all molecules and for the ligands versus just the decoys. It will try to run combine if it has not been run yet, but will do so only with defaults for every option.<br />
<br />
*Input:<br />
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.<br />
<br />
The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip consideration of decoys and thus generation of _own curves. Use -f to force combine to run again.<br />
<br />
<span id="Enrich_Types"></span><br />
*Creates:<br />
#enrich.txt - Enrichment curve for ligands versus all molecules<br />
#roc.txt - ROC curve for ligands versus all molecules<br />
#enrich_own.txt - Enrichment curve for ligands versus only the decoys<br />
#roc_own.txt - ROC curve for ligands versus only the decoys<br />
_own files are not generate is the -s option is used.<br />
<br />
format for output files:<br />
#AUC 50.00 LogAUC 0.00<br />
<x> <y><br />
<x> <y><br />
...<br />
AUC is area under the curve and the random expectation value is 50%. LogAUC is the area between the log curve and the log random curve, so the random expectation value is 0%. <y> is always "% ligands found", and <x> is either "% database searched" for enrichment plots or "% non-ligands found" for ROC plots.<br />
<br />
===Plotting Enrichments===<br />
Easily plot enrichment and roc curves from one or more jobs.<br />
$mud/plots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC<br />
< or ><br />
$mud/plots.py -i .<br />
Generates plots with one curve for each -i input_directory.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip _own curves, especially if they don't exist because enrich.py was run with -s. You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory. Use -n to get normal instead of semi-log plots (and AUC in place of LogAUC).<br />
<br />
*Creates:<br />
#[title_]enrich.png<br />
#[title_]roc.png<br />
#[title_]enrich_own.png<br />
#[title_]roc_own.png<br />
<br />
The various graphs have the same meaning as their respective cures from [[#Computing Enrichments]]. [title_] is optional and exists when a custom title is given with the -t option.<br />
<br />
===Computing Energy Histograms===<br />
*Compute energy distributions starting from the combined scores.<br />
$mud/energies.py -s -l LIGAND_FILE<br />
< or ><br />
$mud/energies.py -l LIGAND_FILE -d DECOY_FILE<br />
Generates the energy distributions for the ligands, decoys, and all the other molecules.<br />
<br />
*Input:<br />
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.<br />
<br />
The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip consideration of decoys.<br />
<br />
*Creates:<br />
#counts.txt - Energy distributions<br />
<br />
format for output:<br />
number_of_sections number_of_bins min_energy_threshold max_energy_threshold<br />
##### section_name<br />
bin_upper_edge1 count_below_edge1<br />
...<br />
bin_upper_edgeN count_below_edgeN<br />
ABOVE count_above_last_edge<br />
The sections are for ligands, decoys (optional), and others. The bins and counts define the energy histogram. The bins are finely spaced here in order to have more resolution when combine with other runs, whose energy ranges may be different.<br />
<br />
===Plotting Energy Histograms===<br />
Easily plot energy histograms from one or more jobs.<br />
$mud/eplots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC<br />
< or ><br />
$mud/eplots.py -i .<br />
Generates plots with energy distributions for each -i input_directory.<br />
<br />
*Options:<br />
You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory.<br />
<br />
*Creates:<br />
#[title_]counts.png<br />
<br />
===Visualizing Molecule by Molecule Results===<br />
Create a DOCK 4,5,6 type pdb file for use in Chimera's ViewDOCK.<br />
$mud/topdock.py -o topdock.pdb<br />
<br />
*Options:<br />
Use -o to specify an output file besides stdout. Use -t NUMBER to get whatever number of top scoring molecules.<br />
<br />
&rarr; Back to [[Tutorials]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=SGE_Cluster_Docking&diff=4312SGE Cluster Docking2009-12-05T01:58:23Z<p>Mysinger: </p>
<hr />
<div>== SGE Cluster Information ==<br />
<br />
*'sgehead.compbio.ucsf.edu' is the submit machine for the Sun Grid Engine (SGE) cluster. wilco is also authorized to submit jobs.<br />
*There are around 250 cluster nodes providing 600 total cores to run jobs in the sge queue as of May, 2009, named like 'node-1-1' through 'node-3-36' where the first number is the rack # and the second is the slot # in that rack.<br />
<br />
== SGE Commands ==<br />
*sgestat: high level overview of cluster status<br />
*qsub: submit jobs<br><br />
*qstat: check job status<br><br />
*qdel: remove jobs<br><br />
*qhost: check cluster status<br> <br />
*man sge_intro: start of manpage documentation<br><br />
<br />
== Typical Docking Workflow ==<br />
<br />
*Generate spheres and grids - See [[Using MakeDOCK]] for more information, including how to prepare the receptor and ligand<br />
ssh sgehead.compbio.ucsf.edu # ssh to SGE submit machine<br />
mkdir example # make docking directory<br />
cd example # change to docking directory<br />
cp <somedir>/rec.pdb . # copy or create rec.pdb<br />
cp <somedir>/xtal-lig.mol2 . # copy or create xtal-lig.mol2 (or even xtal-lig.pdb)<br />
startdockblaster5 # create spheres and grids <br />
# Check output for WARNING messages, correct as needed<br />
<br />
* Setting up a docking run<br />
cp calibrate/INDOCK.1.A INDOCK # copy or create INDOCK<br />
md4db.csh bysubset 2 100 # create directories for docking run with 100 chunks<br />
# 2 indicates we want the fragment-like subset of ZINC (See http://zinc.docking.org/subset1)<br />
cd run.2 # chdir into run.2 directory<br />
<br />
* Everything else<br />
See [[MUD - Michael's Utilities for Docking]] for how to submit, check, and analyse the docking run.<br />
<br />
For information on which ZINC<br />
<br />
[[Category:Internal]]<br />
[[Category:Tutorials]]<br />
[[Category:Cluster]]<br />
[[Category:Unix]]</div>Mysingerhttp://wiki.docking.org/index.php?title=SGE_Cluster_Docking&diff=4311SGE Cluster Docking2009-12-05T01:54:22Z<p>Mysinger: Update to modern workflow</p>
<hr />
<div>== SGE Cluster Information ==<br />
<br />
*'sgehead.compbio.ucsf.edu' is the submit machine for the Sun Grid Engine (SGE) cluster. wilco is also authorized to submit jobs.<br />
*There are around 250 cluster nodes providing 600 total cores to run jobs in the sge queue as of May, 2009, named like 'node-1-1' through 'node-3-36' where the first number is the rack # and the second is the slot # in that rack.<br />
<br />
== SGE Commands ==<br />
*sgestat: high level overview of cluster status<br />
*qsub: submit jobs<br><br />
*qstat: check job status<br><br />
*qdel: remove jobs<br><br />
*qhost: check cluster status<br> <br />
*man sge_intro: start of manpage documentation<br><br />
<br />
== Typical Docking Workflow ==<br />
<br />
*Generate spheres and grids - See [[Using MakeDOCK]] for more information, including how to prepare the receptor and ligand<br />
ssh sgehead.compbio.ucsf.edu # ssh to SGE submit machine<br />
mkdir example # make docking directory<br />
cd example # change to docking directory<br />
cp <somedir>/rec.pdb . # copy or create rec.pdb<br />
cp <somedir>/xtal-lig.mol2 . # copy or create xtal-lig.mol2 (or even xtal-lig.pdb)<br />
startdockblaster5 # create spheres and grids <br />
# Check output for WARNING messages, correct as needed<br />
<br />
* Setting up a docking run<br />
cp calibrate/INDOCK.1.A INDOCK # copy or create INDOCK<br />
md4db.csh bysubset 2 100 # create directories for docking run with 100 chunks<br />
# 2 indicates we want the fragment-like subset of ZINC (See http://zinc.docking.org/subset1)<br />
cd run.2 # chdir into run.2 directory<br />
<br />
* Everything else<br />
See [[MUD - Michael's Utilities for Docking]] for how to submit, check, and analyse the docking run.<br />
<br />
For information on which ZINC<br />
<br />
[[Category:Internal]]<br />
[[Category:Cluster]]<br />
[[Category:Unix]]</div>Mysingerhttp://wiki.docking.org/index.php?title=MUD_-_Michael%27s_Utilities_for_Docking&diff=3543MUD - Michael's Utilities for Docking2009-12-05T01:44:35Z<p>Mysinger: Add energy histogram programs</p>
<hr />
<div>==What's in MUD?==<br />
<br />
*Tools to start, check, and restart dock jobs<br />
*Tools to combine, enrich, plot, and view docking results<br />
<br />
==Setting up MUD==<br />
<br />
*For convenience, point a shell variable to the base mud directory to save typing<br />
set mud=~mysinger/code/mud/trunk<br />
*If you use MUD a lot, you can add this to your ~/.login<br />
*Then simply run commands like this:<br />
$mud/submit.csh<br />
$mud/check.py -h<br />
*Use -h or --help to get full help information for the .py (python) scripts<br />
*The .csh scripts will automatically print usage information if mis-used<br />
*The scripts automatically use their invocation path to find other scripts and libraries they depend on.<br />
<br />
==Job Control==<br />
<br />
===Main Workflow===<br />
*Submit a parallel job to the cluser<br />
$mud/submit.csh<br />
Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.<br />
*Check parallel job status<br />
$mud/check.py<br />
Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.<br />
*Restart all failed subjobs<br />
$mud/restart.py<br />
This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.<br />
<br />
===Specialized Commands===<br />
*Submit job to the local machine<br />
$mud/sublocal.csh<br />
*Submit a single directory to the cluster<br />
qsub $mud/runsge.csh<br />
*Submit a single directory to the local machine<br />
$mud/runsubdir.csh<br />
*Remove docking output leaving only input - will DELETE even completed jobs<br />
$mud/clean.py<br />
*Restart single directory<br />
$mud/restartdir.py<br />
<br />
==Job Analysis==<br />
<br />
*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules. <br />
<br />
To achieve consistency, you have two options:<br />
1. Write coordinates for all molecules (what I use)<br />
In INDOCK, set number_save to 50000 or something high enough to capture all dockable hierarchies. DOCK output is now gzipped so this is cheaper in disk space than it used to be.<br />
2. Do not check for broken molecules<br />
Use the -b option when running combine.py<br />
<br />
===Combining Parallel Jobs===<br />
*Merge all parallel jobs into a single set of unique scores.<br />
$mud/combine.py<br />
This combine carefully accounts for all docked molecules, for more informative enrichment plots.<br />
<br />
*Options:<br />
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.<br />
<br />
*Creates:<br />
#combine.scores - fully processed scores, using the best one for each id<br />
#combine.raw - contains all scores as scrapped from DOCK output<br />
#combine.broken - broken molecules and the reason they failed<br />
#combine.zeroes - important sanity check<br />
<br />
format of combine.scores:<br />
<id> <shape> <elect> <VdW> <polar solv> <apolar solv> <total> <subdir><br />
<br />
The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results. <br />
<br />
===Computing Enrichments===<br />
*Compute enrichment starting from the combined scores.<br />
$mud/enrich.py -s -l LIGAND_FILE<br />
< or ><br />
$mud/enrich.py -l LIGAND_FILE -d DECOY_FILE<br />
Generates both enrichment and roc curves, both for the ligands against all molecules and for the ligands versus just the decoys. It will try to run combine if it has not been run yet, but will do so only with defaults for every option.<br />
<br />
*Input:<br />
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.<br />
<br />
The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip consideration of decoys and thus generation of _own curves. Use -f to force combine to run again.<br />
<br />
<span id="Enrich_Types"></span><br />
*Creates:<br />
#enrich.txt - Enrichment curve for ligands versus all molecules<br />
#roc.txt - ROC curve for ligands versus all molecules<br />
#enrich_own.txt - Enrichment curve for ligands versus only the decoys<br />
#roc_own.txt - ROC curve for ligands versus only the decoys<br />
_own files are not generate is the -s option is used.<br />
<br />
format for output files:<br />
#AUC 50.00 LogAUC 0.00<br />
<x> <y><br />
<x> <y><br />
...<br />
AUC is area under the curve and the random expectation value is 50%. LogAUC is the area between the log curve and the log random curve, so the random expectation value is 0%. <y> is always "% ligands found", and <x> is either "% database searched" for enrichment plots or "% non-ligands found" for ROC plots.<br />
<br />
===Plotting Enrichments===<br />
Easily plot enrichment and roc curves from one or more jobs.<br />
$mud/plots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC<br />
< or ><br />
$mud/plots.py -i .<br />
Generates plots with one curve for each -i input_directory.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip _own curves, especially if they don't exist because enrich.py was run with -s. You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory. Use -n to get normal instead of semi-log plots (and AUC in place of LogAUC).<br />
<br />
*Creates:<br />
#[title_]enrich.png<br />
#[title_]roc.png<br />
#[title_]enrich_own.png<br />
#[title_]roc_own.png<br />
<br />
The various graphs have the same meaning as their respective cures from [[#Computing Enrichments]]. [title_] is optional and exists when a custom title is given with the -t option.<br />
<br />
===Computing Energy Histograms===<br />
*Compute energy distributions starting from the combined scores.<br />
$mud/energies.py -s -l LIGAND_FILE<br />
< or ><br />
$mud/energies.py -l LIGAND_FILE -d DECOY_FILE<br />
Generates the energy distributions for the ligands, decoys, and all the other molecules.<br />
<br />
*Input:<br />
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.<br />
<br />
The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.<br />
<br />
*Options:<br />
Use -s or --skip-own-curves to skip consideration of decoys.<br />
<br />
*Creates:<br />
#counts.txt - Energy distributions<br />
<br />
format for output:<br />
number_of_sections number_of_bins min_energy_threshold max_energy_threshold<br />
##### section_name<br />
bin_upper_edge1 count_below_edge1<br />
...<br />
bin_upper_edgeN count_below_edgeN<br />
ABOVE count_above_last_edge<br />
The sections are for ligands, decoys (optional), and others. The bins and counts define the energy histogram. The bins are finely spaced here in order to have more resolution when combine with other runs, whose energy ranges may be different.<br />
<br />
===Plotting Energy Histograms===<br />
Easily plot energy histograms from one or more jobs.<br />
$mud/eplots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC<br />
< or ><br />
$mud/eplots.py -i .<br />
Generates plots with energy distributions for each -i input_directory.<br />
<br />
*Options:<br />
You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory.<br />
<br />
*Creates:<br />
#[title_]counts.png<br />
<br />
===Visualizing Molecule by Molecule Results===<br />
Create a DOCK 4,5,6 type pdb file for use in Chimera's ViewDOCK.<br />
$mud/topdock.py -o topdock.pdb<br />
<br />
*Options:<br />
Use -o to specify an output file besides stdout. Use -t NUMBER to get whatever number of top scoring molecules.<br />
<br />
&rarr; Back to [[Tutorials]]<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=How_to_compile_DOCK&diff=3204How to compile DOCK2009-12-05T01:19:03Z<p>Mysinger: </p>
<hr />
<div><br />
This is for the Shoichet Lab local version of DOCK 3.5.54 trunk. <br />
<br />
'''Checking out the source files'''<br />
<br />
Commands:<br />
csh<br />
mkdir /where/to/put<br />
cd /where/to/put<br />
svn checkout file:///raid4/svn/dock<br />
svn checkout file:///raid4/svn/libfgz<br />
<br />
'''Compiling the program on our cluster'''<br />
<br />
Commands:<br />
ssh sgehead<br />
# You should see "Enabling pgf compiler" when you login, otherwise seek help<br />
cd /where/to/put/libfgz/trunk<br />
make<br />
cd ../../dock/trunk/i386<br />
make<br />
<br />
'''Compiling the program on the shared QB3 cluster'''<br />
<br />
On one of the compilation nodes on the shared QB3 cluster (optint1 or optint2):<br />
<br />
ssh optint2<br />
cd /where/to/put/libfgz/trunk<br />
cp Makefile Makefile.old<br />
modify Makefile:<br />
uncomment the following:<br />
FC = ifort -O3<br />
CC = icc -O3<br />
make<br />
cd ../../dock/trunk/i386<br />
cp Makefile Makefile.old<br />
modify Makefile<br />
uncomment the following:<br />
F77 = ifort<br />
FFLAGS = -O3 -convert big_endian<br />
make dock<br />
<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=How_to_compile_DOCK&diff=3203How to compile DOCK2009-12-05T01:14:15Z<p>Mysinger: Change to subversion</p>
<hr />
<div>'''Checking out the source files'''<br />
<br />
Commands:<br />
csh<br />
mkdir /where/to/put<br />
cd /where/to/put<br />
svn checkout file:///raid4/svn/dock<br />
svn checkout file:///raid4/svn/libfgz<br />
<br />
'''Compiling the program on our cluster'''<br />
<br />
Commands:<br />
cd /where/to/put/libfgz/trunk<br />
make<br />
cd ../../dock/trunk/i386<br />
make<br />
<br />
'''Compiling the program on the shared QB3 cluster'''<br />
<br />
On one of the compilation nodes on the shared QB3 cluster (optint1 or optint2):<br />
<br />
ssh optint2<br />
cd /where/to/put/libfgz/trunk<br />
cp Makefile Makefile.old<br />
modify Makefile:<br />
uncomment the following:<br />
FC = ifort -O3<br />
CC = icc -O3<br />
make<br />
cd ../../dock/trunk/i386<br />
cp Makefile Makefile.old<br />
modify Makefile<br />
uncomment the following:<br />
F77 = ifort<br />
FFLAGS = -O3 -convert big_endian<br />
make dock<br />
<br />
[[Category:Tutorials]]</div>Mysingerhttp://wiki.docking.org/index.php?title=SGE_Cluster_Docking&diff=4310SGE Cluster Docking2009-09-25T21:15:23Z<p>Mysinger: /* Typical Docking Workflow */</p>
<hr />
<div>== SGE Cluster Information ==<br />
<br />
*'sgehead.compbio.ucsf.edu' is the submit machine for the Sun Grid Engine (SGE) cluster. wilco is also authorized to submit jobs.<br />
*'sgemaster.compbio.ucsf.edu' is the admin machine for the SGE cluster.<br />
*There are around 250 cluster nodes providing 600 total cores to run jobs in the sge queue as of May, 2009, named like 'node-1-1' through 'node-3-36' where the first number is the rack # and the second is the slot # in that rack.<br />
<br />
== SGE Commands ==<br />
*qsub: submit jobs<br><br />
*qstat: check job status<br><br />
*qdel: remove jobs<br><br />
*qhost: check cluster status<br> <br />
*man sge_intro: start of manpage documentation<br><br />
<br />
== Typical Docking Workflow ==<br />
<br />
*Generate spheres and grids - See [[Using MakeDOCK]] for more information, including how to prepare the receptor and ligand<br />
ssh sgehead.compbio.ucsf.edu # ssh to SGE submit machine<br />
mkdir example # make docking directory<br />
cd example # change to docking directory<br />
cp <somedir>/rec.pdb . # copy or create rec.pdb<br />
cp <somedir>/xtal-lig.mol2 . # copy or create xtal-lig.mol2 (or even xtal-lig.pdb)<br />
startdockblaster4 # create spheres and grids <br />
# Check output for WARNING messages, correct as needed<br />
<br />
* Submit docking run<br />
cp calibrate/INDOCK.1.A INDOCK # copy or create INDOCK<br />
md4db.csh bysubset 2 50 # create directories for docking run with 50 chunks<br />
# 2 indicates we want the fragment-like subset of ZINC (See http://zinc.docking.org/subset1)<br />
cd run.2 # chdir into run.2 directory<br />
startdockbks3 . # submit database chunks to SGE cluster<br />
<br />
<br />
For information on which ZINC<br />
<br />
[[Category:Internal]]<br />
[[Category:Cluster]]<br />
[[Category:Unix]]</div>Mysingerhttp://wiki.docking.org/index.php?title=How_to_compile_DOCK&diff=3202How to compile DOCK2009-08-28T00:30:52Z<p>Mysinger: bugfix</p>
<hr />
<div>'''Checking out the source files'''<br />
* change to cshell.<br />
* create a directory for the source files.<br />
* change to this directory.<br />
* set the environment variable for CVS.<br />
* check out the dock sources.<br />
* check out the auxilliary libraries.<br />
<br />
As commands:<br />
csh<br />
mkdir /where/to/put/dock35<br />
cd /where/to/put/dock35<br />
setenv CVSROOT /raid1/cvs<br />
cvs co dock<br />
cvs co libfgz<br />
<br />
'''Compiling the program'''<br />
<br />
On a 64-bit machine, e.g. one of the compilation nodes on the shared QB3 cluster (optint1 or optint2):<br />
<br />
ssh optint2<br />
cd /where/to/put/dock35<br />
cd libfgz/<br />
cp Makefile Makefile.old<br />
modify Makefile:<br />
comment out the following:<br />
#FC = gfortran -O3<br />
#CC = gcc -O3<br />
uncomment the following:<br />
FC = ifort -O3<br />
CC = icc -O3<br />
make<br />
cd ../dock/i386/<br />
cp Makefile Makefile.old<br />
modify Makefile<br />
comment out the following:<br />
#F77 = pgf77<br />
#FFLIBS = -lc -lgcc_eh -lgfortran<br />
#FFLAGS = -byteswapio ...<br />
uncomment the following:<br />
F77 = ifort<br />
FFLAGS = -O3 -convert big_endian<br />
make<br />
<br />
<br />
[[Category:Tutorials]]</div>Mysinger