DISI - User contributions [en]

Preparing the protein

2012-05-10T21:44:01Z

Mysinger: switch to solvmap_sev

=Preparing the protein=

Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins.

==Modifying the PDB file==

*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements.
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE¤) with sulphur (¤SD). Be careful about the correct alignment!
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably.
*select the protonation states of HIS residues to be either δ- (rename residue to HID), ε- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS.
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!

==Running startdockblaster5==

*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.
*generate the files <tt>.only_spheres</tt> and – in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> – <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> .
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand.
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n — you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch — it is likely that there are some blanks or hidden characters that are causing the problems.
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms.
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].

==Removing and modifying files==

*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs): <tt>rm -f PDBPARM chem.* rec+sph.phi solvmap_sev tart.txt OUT*</tt>
*modify <tt>rec.crg</tt>:
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK.
**remove all TER statements that might have been added.
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN.
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX.
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble.
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively ⇒ do not tart any residues in this file!

==Running <tt>[[chemgrid]]</tt> ==

*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues.
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3rd and 4th column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (≤ -200).
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.

==Tarting the protein==

*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using: <tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt>
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>.
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt>
*optionally tart the residues that are in contact with a crystallographic ligand, if any.
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.

==Modifying the Delphi spheres==

*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres).
*delete the spheres that are too close to the solvent.
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues.
*a good number for DelPhi spheres is 120.
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi!

==Modifying the Matching spheres==

*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms.
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.
*a good number for matching spheres is 50-60.
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]].
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres.
*run <tt>cat $mud/header.sph match2.sph</tt> .

==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==

*if you changed rec+sph.crg above, you need to run Delphi
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate.
*run <tt>delphi.com > delphi.log</tt> and check the output.
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.

==Running <tt>[[newsolv.sev]]</tt> ==

*if you changed rec.crg or the box above, you need to run newsolv.sev
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>newsolv.sev</tt> .

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Preparing the protein

2012-05-10T21:42:58Z

Mysinger: change solvmap to newsolv.sev in 2 places

=Preparing the protein=

Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins.

==Modifying the PDB file==

*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements.
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE¤) with sulphur (¤SD). Be careful about the correct alignment!
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably.
*select the protonation states of HIS residues to be either δ- (rename residue to HID), ε- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS.
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!

==Running startdockblaster5==

*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.
*generate the files <tt>.only_spheres</tt> and – in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> – <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> .
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand.
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n — you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch — it is likely that there are some blanks or hidden characters that are causing the problems.
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms.
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].

==Removing and modifying files==

*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs): <tt>rm -f PDBPARM chem.* rec+sph.phi solvmap tart.txt OUT*</tt>
*modify <tt>rec.crg</tt>:
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK.
**remove all TER statements that might have been added.
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN.
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX.
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble.
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively ⇒ do not tart any residues in this file!

==Running <tt>[[chemgrid]]</tt> ==

*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues.
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3rd and 4th column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (≤ -200).
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.

==Tarting the protein==

*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using: <tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt>
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>.
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt>
*optionally tart the residues that are in contact with a crystallographic ligand, if any.
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.

==Modifying the Delphi spheres==

*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres).
*delete the spheres that are too close to the solvent.
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues.
*a good number for DelPhi spheres is 120.
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi!

==Modifying the Matching spheres==

*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms.
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.
*a good number for matching spheres is 50-60.
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]].
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres.
*run <tt>cat $mud/header.sph match2.sph</tt> .

==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==

*if you changed rec+sph.crg above, you need to run Delphi
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate.
*run <tt>delphi.com > delphi.log</tt> and check the output.
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.

==Running <tt>[[newsolv.sev]]</tt> ==

*if you changed rec.crg or the box above, you need to run newsolv.sev
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>newsolv.sev</tt> .

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

ZINC processing pipeline

2012-03-15T20:52:04Z

Mysinger: Adjust current ring puckering settings

Each molecule in ZINC is processed via our ZINC processing pipeline. This process is embodied in a set of scripts that we continue to refine as we discover problems.

Frankly, we hope people will simply use ZINC rather than trying to reproduce it. Still, in the interests of clarity, transparency, truth, justice and the Canadian Way (TM), here is our current protocol.

* 1. If you have 2D SDF, convert it to isomeric SMILES.

* 2. sed -e 's/N=S=N/nsn/g' 2.ism > 2-out.ism

* 3. Use molinspiration mitools/mib to eliminate broken SMILES:
java -jar /raid1/soft/mitools/mib.jar -singlepart -onlyOrganic -normalizeCharges -f $1 -out smi

* 4. Use OEChem to remove molecules with problematic functional groups:
filter.py rules.txt 4.ism 4-out.ism > filterlog.txt
see http://blaster.docking.org/filtering/rules_default.txt for current rules.

* 5. select only 4 of stereochemical expansions from previous step. We just take the first 4, but you can imagine better ways of making the selection.

* 6. get rid of bogus stereochemistry at nitrogen:
sed -e 's/\[N@\]/N/g' -e 's/\[N@@\]/N/g' -e 's/\[N@H+\]/\[NH+\]/g' -e 's/\[N@@H+\]/\[NH+\]/g' -e 's/\[N@@+\]/\[N+\]/g' -e 's/\[N@+\]/\[N+\]/g' $1 > d.ism

* 7. If the molecule is already in ZINC, eliminate it from the list.

* 8. Generate trial 3D structure with corina.
corina -d neu,wh,rc,mc=1,canon -i t=smiles -o t=sdf < 1a.ism > 2.sdf

* 9. generate reference pH state using Schrodinger's Epik.
epik -ph 7.05 -ms 1 -imae A.mae -omae B.mae -WAIT

* 10. generate mid, hi and lo pH subsets
mid: setenv EPIK "-ph 7.0 -pht 1 -tp 0.20"
hi: setenv EPIK "-ph 8.5 -pht 0.75 -tp 0.20"
lo: setenv EPIK "-ph 5.5 -pht 0.75 -tp 0.20"
epik $EPIK -imae A.mae -omae B.mae -WAIT

* 11. For each subset (ref, mid, hi, lo) use Corina to generate 3D model of the relevant protonated state.
corina -d rc,flapn,de=6,mc=4 -i t=mol2 -o t=mol2

That's really it. There is more to do with loading ZINC, but to generate the models, that is what we think you need to know. Good luck!

-- John Irwin. March 2009.

Preparing the protein

2012-01-25T23:39:57Z

Mysinger: /* Running <tt>solvmap</tt> */

=Preparing the protein=

Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins.

==Modifying the PDB file==

*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements.
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE¤) with sulphur (¤SD). Be careful about the correct alignment!
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably.
*select the protonation states of HIS residues to be either δ- (rename residue to HID), ε- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS.
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!

==Running startdockblaster5==

*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.
*generate the files <tt>.only_spheres</tt> and – in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> – <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> .
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand.
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n — you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch — it is likely that there are some blanks or hidden characters that are causing the problems.
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms.
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].

==Removing and modifying files==

*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs): <tt>rm -f PDBPARM chem.* rec+sph.phi solvmap tart.txt OUT*</tt>
*modify <tt>rec.crg</tt>:
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK.
**remove all TER statements that might have been added.
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN.
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX.
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble.
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively ⇒ do not tart any residues in this file!

==Running <tt>[[chemgrid]]</tt> ==

*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues.
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3rd and 4th column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (≤ -200).
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.

==Tarting the protein==

*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using: <tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt>
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>.
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt>
*optionally tart the residues that are in contact with a crystallographic ligand, if any.
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.

==Modifying the Delphi spheres==

*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres).
*delete the spheres that are too close to the solvent.
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues.
*a good number for DelPhi spheres is 120.
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi!

==Modifying the Matching spheres==

*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms.
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.
*a good number for matching spheres is 50-60.
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]].
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres.
*run <tt>cat $mud/header.sph match2.sph</tt> .

==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==

*if you changed rec+sph.crg above, you need to run Delphi
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate.
*run <tt>delphi.com > delphi.log</tt> and check the output.
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.

==Running <tt>[[solvmap]]</tt> ==

*if you changed rec.crg or the box above, you need to run solvmap
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>newsolv.sev</tt> .

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

INDOCK for DOCK 3.6

2011-10-26T20:45:32Z

Mysinger: change file date to match sphere change

What follows is a documented sample INDOCK file for [[DOCK 3.6]]. Many lines are required, lines starting with # are comments.

Required first line:

DOCK 3.5 parameter
###############################################################################
################## DOCK 3.5 INPUT PARAMETERS 2011/10/26 #######################
###############################################################################
###############################################################################
# INPUT/OUTPUT
#

This is the path to the receptor matching spheres file. Most scripts make a set of directories and copy the INDOCK file into them, so this path sometimes has an extra set of "../" in it compared to what you might think. If you use [[DOCK Blaster]]. Generally, match3 has more spheres than match2, so produces more possible orientations. These spheres are matched to ligand spheres, generated from heavy atoms in the "rigid component" of each ligand. For more about the rigid component, see [[Flexibase Format]].

receptor_sphere_file ../../sph/match2.sph

The next line is always 1, and is marked for deprecation.

cluster_numbers 1

The next line refers to which ligand file to use. If using many of the automated scripts, split_database_index is used, as this allows many ligand files (or just 1) to be placed in the split_database_index file and read in one after another during a DOCK run. If docking small things on your own, you can change this to any file.

# NOTE: split_database_index is reserved to specify a list of files
ligand_atom_file split_database_index

This will control the file output, again many of the automated scripts expect it to be test. OUTDOCK files are always named OUTDOCK.

output_file_prefix test.

This controls the random seed used in the minimization procedure. Changing this will produce slightly different results.

random_seed 777
#
###############################################################################
# MATCHING
#

distance_tolerance is how different the distances can be between a pair of receptor matching spheres and a pair of ligand matching spheres for them to still be considered matched.

distance_tolerance 1.5

This changes how many spheres must be matched to generate an orientation. 3 as a minimum, 4 as a maximum is generally accepted as the right thing to use. Less than 3 is too degenerate to generate an actual orientation, and requiring more than 4 matched spheres does not work well, since we only use heavy atoms in ring systems to generate ligand matching spheres.

nodes_maximum 4
nodes_minimum 3

The next 4 parameters control how the histograms of distance differences are generated. The binsize is how big the bins are, the overlap controls if a sphere can be put into multiple bins. The ligand & receptor parameters are not required to be the same.

ligand_binsize 0.4
ligand_overlap 0.2
receptor_binsize 0.4
receptor_overlap 0.2

Bumping is using a quick check of distances when placing ligand atoms in the binding site to determine if they have a steric clash. The maximum is how many can be 'bumped' or in close steric contact per rigid or flexible component of the ligand, as per the [[Flexibase Format]]. Even ligands with some steric clashes can sometimes be rescued by minimization. Setting this number very high will cause many clashed orientations to be scored, which can be prohibitively slow.

bump_maximum 1

The next four parameters are unused and unsupported.

focus_cycles 0
focus_bump 0
focus_type energy
critical_clusters no
#
###############################################################################
# COLORING
#

This controls whether chemical matching or coloring is used at all. If yes, many match lines are necessary. These may not be perfect, but [[DOCK Blaster]] has been using these for a long time. Setting this to no produces many more matched orientations, which can be slow, but can help you understand exactly what the energy function is doing.

chemical_matching yes
case_sensitive no
# ligand color, receptor color
match positive negative
match positive negative_or_acceptor
match positive not_neutral
match negative positive
match negative positive_or_donor
match negative not_neutral
match donor acceptor
match donor donacc
match donor negative_or_acceptor
match donor neutral_or_acceptor_or_donor
match donor not_neutral
match acceptor donor
match acceptor donacc
match acceptor positive_or_donor
match acceptor neutral_or_acceptor_or_donor
match acceptor not_neutral
match neutral neutral
match neutral neutral_or_acceptor_or_donor
match ester_o donor
match ester_o donacc
match ester_o positive_or_donor
match ester_o not_neutral
match amide_o donor
match amide_o donacc
match amide_o positive_or_donor
match amide_o not_neutral

Single mode is deprecated, these parameters won't work. See [[Dock Ligand Clustering]]
#
###############################################################################
# SINGLE MODE
#
#rmsd_override 0.0
#contact_minimum 0
#energy_maximum 1.0e+6
##truncate_output 1000.0
#

Search mode is now the default/only mode of docking. Each parameter is described below.

###############################################################################
# SEARCH MODE
#

The ratio_minimum parameter has been slated for deprecation.

ratio_minimum 0.0

These parameters control how many atoms are necessary in the ligand for it to be docked.

atom_minimum 5
atom_maximum 100

How many of the top molecules will be saved in the output test.* file.

number_save 50000

The maximum number of molecules that will be scored in any given run.

molecules_maximum 300000

How many molecules will be skipped, this feature currently does not work.

initial_skip 0

How long a molecule is processed before quitting. This feature currently may not work as expected.

timeout 180

There are many scoring options:

#
###############################################################################
# SCORING
#

Valid options for ligand_desolvation are 'volume' (partial desolvation a la Mysinger & Shoichet 2010), 'full' meaning that the entire ligand is assumed to be desolvated in the binding site and 'none', where no desolvation penalties are applied.

ligand_desolvation volume

See the note about relative paths for the matching spheres above, the same comments apply here. There are 2 ways to run 'volume' or partial desolvation, one is to use one grid for every ligand atom like this:

solvmap_file ../../grids/solvmap_sev

The other option is to use one grid for ligand heavy atoms and one for ligand hydrogen atoms, you'll want to uncomment these lines to use them (and comment out the other solvmap_file line).

#solvmap_file ../../grids/solvmap_sev.heavy
#hydrogen_solvmap_file ../../grids/solvmap.sev.hydrogen

This is the phimap file used for electrostatic scoring. For a better understanding of this grid, see [[Visualizing delphi]]. Sometimes this will change if you are using the new Qnifft Delphi maps, see [[Qnifft DOCK 3.6 conversion]].

delphi_file ../../grids/rec+sph.phi

This controls the chemgrid file, which contains the van der Waals scoring for every coordinate (chem.vdw will be called) as well as the distance map grids that will be used for deciphering bumping (chem.bmp will be called).

chemgrid_file_prefix ../../grids/chem

This is the parameter file that contains the atom type definitions:

vdw_parameter_file ../../grids/vdw.parms.amb.mindock

The following options allow the electrostatics and van der Waals parameters to be scaled relative to each other and the solvation scoring.

electrostatic_scale 1.0
vdw_scale 1.0

The following parameter lets ligands with internal steric clashes attempt to find a ligand conformation that scores well but does not have any internal clashes. Sometimes this procedure will fail in circumstances where there are many flexible branches, or where a ligand that is too large for the binding site is being docked.

check_clashes yes

If set to yes, this removes the positive solvation from each ligand atom and spreads it evenly over the molecule. This is deprecated because it does unexpected things to solvation, and will be removed entirely soon.

remove_positive_solvation no

After each orientation of the rigid component is processed and the many ligand conformations have been examined, the best ligand conformation for that orientation can be minimized using the following parameters.

#
###############################################################################
# MINIMIZATION
#

No turns off minimization completely.

minimize yes

Don't minimize molecules that score above the minimization_max.

minimization_max 1.0e15

If set to yes, this checks to see if the orientation has already been scored and quits. This has not been tested recently.

check_degeneracy no

How many iterations of minimization to do. More means longer run times, but potentially better poses.

simplex_iterations 250

How much the total energy can changed to be considered converged. Setting this higher will stop faster, setting it lower will cause it to do more iterations before converging (or potentially hitting the iteration max above).

simplex_convergence 0.1

If the energy changes by this much, restart the minimizer from this newest position.

simplex_restart 1.0

This is the initial distance in angstroms the molecule is translated (note that translation and rotation used to be swapped for many releases of DOCK).

simplex_initial_translation 0.2

How many degrees of initial rotation are done.

simplex_initial_rotation 5.0
#
###############################################################################
###############################################################################

INDOCK for DOCK 3.6

2011-10-26T20:41:22Z

Mysinger: change match3 to match2

What follows is a documented sample INDOCK file for [[DOCK 3.6]]. Many lines are required, lines starting with # are comments.

Required first line:

DOCK 3.5 parameter
###############################################################################
################## DOCK 3.5 INPUT PARAMETERS 2011/09/07 #######################
###############################################################################
###############################################################################
# INPUT/OUTPUT
#

This is the path to the receptor matching spheres file. Most scripts make a set of directories and copy the INDOCK file into them, so this path sometimes has an extra set of "../" in it compared to what you might think. If you use [[DOCK Blaster]]. Generally, match3 has more spheres than match2, so produces more possible orientations. These spheres are matched to ligand spheres, generated from heavy atoms in the "rigid component" of each ligand. For more about the rigid component, see [[Flexibase Format]].

receptor_sphere_file ../../sph/match2.sph

The next line is always 1, and is marked for deprecation.

cluster_numbers 1

The next line refers to which ligand file to use. If using many of the automated scripts, split_database_index is used, as this allows many ligand files (or just 1) to be placed in the split_database_index file and read in one after another during a DOCK run. If docking small things on your own, you can change this to any file.

# NOTE: split_database_index is reserved to specify a list of files
ligand_atom_file split_database_index

This will control the file output, again many of the automated scripts expect it to be test. OUTDOCK files are always named OUTDOCK.

output_file_prefix test.

This controls the random seed used in the minimization procedure. Changing this will produce slightly different results.

random_seed 777
#
###############################################################################
# MATCHING
#

distance_tolerance is how different the distances can be between a pair of receptor matching spheres and a pair of ligand matching spheres for them to still be considered matched.

distance_tolerance 1.5

This changes how many spheres must be matched to generate an orientation. 3 as a minimum, 4 as a maximum is generally accepted as the right thing to use. Less than 3 is too degenerate to generate an actual orientation, and requiring more than 4 matched spheres does not work well, since we only use heavy atoms in ring systems to generate ligand matching spheres.

nodes_maximum 4
nodes_minimum 3

The next 4 parameters control how the histograms of distance differences are generated. The binsize is how big the bins are, the overlap controls if a sphere can be put into multiple bins. The ligand & receptor parameters are not required to be the same.

ligand_binsize 0.4
ligand_overlap 0.2
receptor_binsize 0.4
receptor_overlap 0.2

Bumping is using a quick check of distances when placing ligand atoms in the binding site to determine if they have a steric clash. The maximum is how many can be 'bumped' or in close steric contact per rigid or flexible component of the ligand, as per the [[Flexibase Format]]. Even ligands with some steric clashes can sometimes be rescued by minimization. Setting this number very high will cause many clashed orientations to be scored, which can be prohibitively slow.

bump_maximum 1

The next four parameters are unused and unsupported.

focus_cycles 0
focus_bump 0
focus_type energy
critical_clusters no
#
###############################################################################
# COLORING
#

This controls whether chemical matching or coloring is used at all. If yes, many match lines are necessary. These may not be perfect, but [[DOCK Blaster]] has been using these for a long time. Setting this to no produces many more matched orientations, which can be slow, but can help you understand exactly what the energy function is doing.

chemical_matching yes
case_sensitive no
# ligand color, receptor color
match positive negative
match positive negative_or_acceptor
match positive not_neutral
match negative positive
match negative positive_or_donor
match negative not_neutral
match donor acceptor
match donor donacc
match donor negative_or_acceptor
match donor neutral_or_acceptor_or_donor
match donor not_neutral
match acceptor donor
match acceptor donacc
match acceptor positive_or_donor
match acceptor neutral_or_acceptor_or_donor
match acceptor not_neutral
match neutral neutral
match neutral neutral_or_acceptor_or_donor
match ester_o donor
match ester_o donacc
match ester_o positive_or_donor
match ester_o not_neutral
match amide_o donor
match amide_o donacc
match amide_o positive_or_donor
match amide_o not_neutral

Single mode is deprecated, these parameters won't work. See [[Dock Ligand Clustering]]
#
###############################################################################
# SINGLE MODE
#
#rmsd_override 0.0
#contact_minimum 0
#energy_maximum 1.0e+6
##truncate_output 1000.0
#

Search mode is now the default/only mode of docking. Each parameter is described below.

###############################################################################
# SEARCH MODE
#

The ratio_minimum parameter has been slated for deprecation.

ratio_minimum 0.0

These parameters control how many atoms are necessary in the ligand for it to be docked.

atom_minimum 5
atom_maximum 100

How many of the top molecules will be saved in the output test.* file.

number_save 50000

The maximum number of molecules that will be scored in any given run.

molecules_maximum 300000

How many molecules will be skipped, this feature currently does not work.

initial_skip 0

How long a molecule is processed before quitting. This feature currently may not work as expected.

timeout 180

There are many scoring options:

#
###############################################################################
# SCORING
#

Valid options for ligand_desolvation are 'volume' (partial desolvation a la Mysinger & Shoichet 2010), 'full' meaning that the entire ligand is assumed to be desolvated in the binding site and 'none', where no desolvation penalties are applied.

ligand_desolvation volume

See the note about relative paths for the matching spheres above, the same comments apply here. There are 2 ways to run 'volume' or partial desolvation, one is to use one grid for every ligand atom like this:

solvmap_file ../../grids/solvmap_sev

The other option is to use one grid for ligand heavy atoms and one for ligand hydrogen atoms, you'll want to uncomment these lines to use them (and comment out the other solvmap_file line).

#solvmap_file ../../grids/solvmap_sev.heavy
#hydrogen_solvmap_file ../../grids/solvmap.sev.hydrogen

This is the phimap file used for electrostatic scoring. For a better understanding of this grid, see [[Visualizing delphi]]. Sometimes this will change if you are using the new Qnifft Delphi maps, see [[Qnifft DOCK 3.6 conversion]].

delphi_file ../../grids/rec+sph.phi

This controls the chemgrid file, which contains the van der Waals scoring for every coordinate (chem.vdw will be called) as well as the distance map grids that will be used for deciphering bumping (chem.bmp will be called).

chemgrid_file_prefix ../../grids/chem

This is the parameter file that contains the atom type definitions:

vdw_parameter_file ../../grids/vdw.parms.amb.mindock

The following options allow the electrostatics and van der Waals parameters to be scaled relative to each other and the solvation scoring.

electrostatic_scale 1.0
vdw_scale 1.0

The following parameter lets ligands with internal steric clashes attempt to find a ligand conformation that scores well but does not have any internal clashes. Sometimes this procedure will fail in circumstances where there are many flexible branches, or where a ligand that is too large for the binding site is being docked.

check_clashes yes

If set to yes, this removes the positive solvation from each ligand atom and spreads it evenly over the molecule. This is deprecated because it does unexpected things to solvation, and will be removed entirely soon.

remove_positive_solvation no

After each orientation of the rigid component is processed and the many ligand conformations have been examined, the best ligand conformation for that orientation can be minimized using the following parameters.

#
###############################################################################
# MINIMIZATION
#

No turns off minimization completely.

minimize yes

Don't minimize molecules that score above the minimization_max.

minimization_max 1.0e15

If set to yes, this checks to see if the orientation has already been scored and quits. This has not been tested recently.

check_degeneracy no

How many iterations of minimization to do. More means longer run times, but potentially better poses.

simplex_iterations 250

How much the total energy can changed to be considered converged. Setting this higher will stop faster, setting it lower will cause it to do more iterations before converging (or potentially hitting the iteration max above).

simplex_convergence 0.1

If the energy changes by this much, restart the minimizer from this newest position.

simplex_restart 1.0

This is the initial distance in angstroms the molecule is translated (note that translation and rotation used to be swapped for many releases of DOCK).

simplex_initial_translation 0.2

How many degrees of initial rotation are done.

simplex_initial_rotation 5.0
#
###############################################################################
###############################################################################

INDOCK for DOCK 3.6

2011-10-17T01:43:10Z

Mysinger: note that remove_positive_solvation will be removed

What follows is a documented sample INDOCK file for [[DOCK 3.6]]. Many lines are required, lines starting with # are comments.

Required first line:

DOCK 3.5 parameter
###############################################################################
################## DOCK 3.5 INPUT PARAMETERS 2011/09/07 #######################
###############################################################################
###############################################################################
# INPUT/OUTPUT
#

This is the path to the receptor matching spheres file. Most scripts make a set of directories and copy the INDOCK file into them, so this path sometimes has an extra set of "../" in it compared to what you might think. If you use [[DOCK Blaster]]. Generally, match3 has more spheres than match2, so produces more possible orientations. These spheres are matched to ligand spheres, generated from heavy atoms in the "rigid component" of each ligand. For more about the rigid component, see [[Flexibase Format]].

receptor_sphere_file ../../sph/match3.sph

The next line is always 1, and is marked for deprecation.

cluster_numbers 1

The next line refers to which ligand file to use. If using many of the automated scripts, split_database_index is used, as this allows many ligand files (or just 1) to be placed in the split_database_index file and read in one after another during a DOCK run. If docking small things on your own, you can change this to any file.

# NOTE: split_database_index is reserved to specify a list of files
ligand_atom_file split_database_index

This will control the file output, again many of the automated scripts expect it to be test. OUTDOCK files are always named OUTDOCK.

output_file_prefix test.

This controls the random seed used in the minimization procedure. Changing this will produce slightly different results.

random_seed 777
#
###############################################################################
# MATCHING
#

distance_tolerance is how different the distances can be between a pair of receptor matching spheres and a pair of ligand matching spheres for them to still be considered matched.

distance_tolerance 1.5

This changes how many spheres must be matched to generate an orientation. 3 as a minimum, 4 as a maximum is generally accepted as the right thing to use. Less than 3 is too degenerate to generate an actual orientation, and requiring more than 4 matched spheres does not work well, since we only use heavy atoms in ring systems to generate ligand matching spheres.

nodes_maximum 4
nodes_minimum 3

The next 4 parameters control how the histograms of distance differences are generated. The binsize is how big the bins are, the overlap controls if a sphere can be put into multiple bins. The ligand & receptor parameters are not required to be the same.

ligand_binsize 0.4
ligand_overlap 0.2
receptor_binsize 0.4
receptor_overlap 0.2

Bumping is using a quick check of distances when placing ligand atoms in the binding site to determine if they have a steric clash. The maximum is how many can be 'bumped' or in close steric contact per rigid or flexible component of the ligand, as per the [[Flexibase Format]]. Even ligands with some steric clashes can sometimes be rescued by minimization. Setting this number very high will cause many clashed orientations to be scored, which can be prohibitively slow.

bump_maximum 1

The next four parameters are unused and unsupported.

focus_cycles 0
focus_bump 0
focus_type energy
critical_clusters no
#
###############################################################################
# COLORING
#

This controls whether chemical matching or coloring is used at all. If yes, many match lines are necessary. These may not be perfect, but [[DOCK Blaster]] has been using these for a long time. Setting this to no produces many more matched orientations, which can be slow, but can help you understand exactly what the energy function is doing.

chemical_matching yes
case_sensitive no
# ligand color, receptor color
match positive negative
match positive negative_or_acceptor
match positive not_neutral
match negative positive
match negative positive_or_donor
match negative not_neutral
match donor acceptor
match donor donacc
match donor negative_or_acceptor
match donor neutral_or_acceptor_or_donor
match donor not_neutral
match acceptor donor
match acceptor donacc
match acceptor positive_or_donor
match acceptor neutral_or_acceptor_or_donor
match acceptor not_neutral
match neutral neutral
match neutral neutral_or_acceptor_or_donor
match ester_o donor
match ester_o donacc
match ester_o positive_or_donor
match ester_o not_neutral
match amide_o donor
match amide_o donacc
match amide_o positive_or_donor
match amide_o not_neutral

Single mode is deprecated, these parameters won't work. See [[Dock Ligand Clustering]]
#
###############################################################################
# SINGLE MODE
#
#rmsd_override 0.0
#contact_minimum 0
#energy_maximum 1.0e+6
##truncate_output 1000.0
#

Search mode is now the default/only mode of docking. Each parameter is described below.

###############################################################################
# SEARCH MODE
#

The ratio_minimum parameter has been slated for deprecation.

ratio_minimum 0.0

These parameters control how many atoms are necessary in the ligand for it to be docked.

atom_minimum 5
atom_maximum 100

How many of the top molecules will be saved in the output test.* file.

number_save 50000

The maximum number of molecules that will be scored in any given run.

molecules_maximum 300000

How many molecules will be skipped, this feature currently does not work.

initial_skip 0

How long a molecule is processed before quitting. This feature currently may not work as expected.

timeout 180

There are many scoring options:

#
###############################################################################
# SCORING
#

Valid options for ligand_desolvation are 'volume' (partial desolvation a la Mysinger & Shoichet 2010), 'full' meaning that the entire ligand is assumed to be desolvated in the binding site and 'none', where no desolvation penalties are applied.

ligand_desolvation volume

See the note about relative paths for the matching spheres above, the same comments apply here. There are 2 ways to run 'volume' or partial desolvation, one is to use one grid for every ligand atom like this:

solvmap_file ../../grids/solvmap_sev

The other option is to use one grid for ligand heavy atoms and one for ligand hydrogen atoms, you'll want to uncomment these lines to use them (and comment out the other solvmap_file line).

#solvmap_file ../../grids/solvmap_sev.heavy
#hydrogen_solvmap_file ../../grids/solvmap.sev.hydrogen

This is the phimap file used for electrostatic scoring. For a better understanding of this grid, see [[Visualizing delphi]].

delphi_file ../../grids/rec+sph.phi

This controls the chemgrid file, which contains the van der Waals scoring for every coordinate (chem.vdw will be called) as well as the distance map grids that will be used for deciphering bumping (chem.bmp will be called).

chemgrid_file_prefix ../../grids/chem

This is the parameter file that contains the atom type definitions:

vdw_parameter_file ../../grids/vdw.parms.amb.mindock

The following options allow the electrostatics and van der Waals parameters to be scaled relative to each other and the solvation scoring.

electrostatic_scale 1.0
vdw_scale 1.0

The following parameter lets ligands with internal steric clashes attempt to find a ligand conformation that scores well but does not have any internal clashes. Sometimes this procedure will fail in circumstances where there are many flexible branches, or where a ligand that is too large for the binding site is being docked.

check_clashes yes

If set to yes, this removes the positive solvation from each ligand atom and spreads it evenly over the molecule. This is deprecated because it does unexpected things to solvation, and will be removed entirely soon.

remove_positive_solvation no

After each orientation of the rigid component is processed and the many ligand conformations have been examined, the best ligand conformation for that orientation can be minimized using the following parameters.

#
###############################################################################
# MINIMIZATION
#

No turns off minimization completely.

minimize yes

Don't minimize molecules that score above the minimization_max.

minimization_max 1.0e15

If set to yes, this checks to see if the orientation has already been scored and quits. This has not been tested recently.

check_degeneracy no

How many iterations of minimization to do. More means longer run times, but potentially better poses.

simplex_iterations 250

How much the total energy can changed to be considered converged. Setting this higher will stop faster, setting it lower will cause it to do more iterations before converging (or potentially hitting the iteration max above).

simplex_convergence 0.1

If the energy changes by this much, restart the minimizer from this newest position.

simplex_restart 1.0

This is the initial distance in angstroms the molecule is translated (note that translation and rotation used to be swapped for many releases of DOCK).

simplex_initial_translation 0.2

How many degrees of initial rotation are done.

simplex_initial_rotation 5.0
#
###############################################################################
###############################################################################

Running DOCK

2011-09-20T23:58:46Z

Mysinger: /* Running DOCK */

=Running DOCK=

*modify <tt>$mud/INDOCK</tt> and set up the desired directory structure – either manually or by running '<tt>md4db.csh bysubset N1 N2 Type</tt>', where <tt>N1</tt> is the identifier of the library (1: lead-like; 2: fragment-like), <tt>N2</tt> is the number of chunks (i.e., jobs you can run in parallel), and <tt>Type</tt> is the category of library (i.e., bysubset, byvendor, etc).
* if it hasn't been generated by a script, create the file <tt>dirlist</tt>, which conatins the list of the directories (i.e., chunks of the database) that you want to dock.
*if you plan to use any of John's scripts in the downstream processing, leave the output file prefixes at <tt>test.</tt>.
*take care that the paths to the <tt>.db.gz</tt> files in <tt>split_database_index</tt> do not get too long. If they do, go via links.
*submit the calculations to the cluster with <tt>$mud/submit.csh</tt> from the directory in which your data (most importantly, <tt>dirlist</tt>) resides. See [[MUD - Michael's Utilities for Docking]] for setting the $mud variable.

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Preparing the protein

2011-09-20T23:57:09Z

Mysinger: Remove distmap references, add prot2crg.py

=Preparing the protein=

Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins.

==Modifying the PDB file==

*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', all columns to the right of the z-coordinate and the TER statements.
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE¤) with sulphur (¤SD). Be careful about the correct alignment!
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably.
*select the protonation states of HIS residues to be either δ- (rename residue to HID), ε- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS.
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!

==Running startdockblaster5==

*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove all columns to the right of the z-coordinate and the TER statements. Change HETATM to ATOM.
*generate the files <tt>.only_spheres</tt> and – in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> – <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> .
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand.
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n — you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to re-prepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch — it is likely that there are some blanks or hidden characters that are causing the problems.
*Take any WARNING messages emitted seriously, and continue only if you know why each one is there. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms.
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].

==Removing and modifying files==

*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs): <tt>rm -f PDBPARM chem.* rec+sph.phi solvmap tart.txt OUT*</tt>
*modify <tt>rec.crg</tt>:
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK.
**remove all TER statements that might have been added.
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN.
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX.
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble.
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively ⇒ do not tart any residues in this file!

==Running <tt>[[chemgrid]]</tt> ==

*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct van der Waals parameters of all residues.
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3rd and 4th column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (≤ -200).
*Another sign of a problem with atomic radii are any 'WARNING's issued in OUTPARM
*if one has to run <tt>chemgrid</tt> again, first remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.

==Tarting the protein==

*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.
* add the relevant resides to the bottom of your <tt>prot.table.ambcrg.ambH</tt> file, being very precise to match the current formatting
* generate the new <tt>amb.crg.oxt</tt> from the edited <tt>prot.table.ambcrg.ambH</tt> using: <tt>$mud/prot2crg.py < prot.table.ambcrg.ambH > amb.crg.oxt</tt>
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt>, where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt>.
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt>
*optionally tart the residues that are in contact with a crystallographic ligand, if any.
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.

==Modifying the Delphi spheres==

*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres).
*delete the spheres that are too close to the solvent.
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues.
*a good number for DelPhi spheres is 120.
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi!

==Modifying the Matching spheres==

*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms.
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.
*a good number for matching spheres is 50-60.
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]].
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres.
*run <tt>cat $mud/header.sph match2.sph</tt> .

==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==

*if you changed rec+sph.crg above, you need to run Delphi
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate.
*run <tt>delphi.com > delphi.log</tt> and check the output.
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.

==Running <tt>[[solvmap]]</tt> ==

*if you changed rec.crg or the box above, you need to run solvmap
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>solvmap</tt> .

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Preparing the protein

2011-06-08T06:06:02Z

Mysinger:

=Preparing the protein=

Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins.

==Modifying the PDB file==

*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', the chain column, all columns to the right of the z-coordinate and the TER statements.
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE¤) with sulphur (¤SD). Be careful about the correct alignment!
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably.
*select the protonation states of HIS residues to be either δ- (rename residue to HID), ε- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS.
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!

==Running startdockblaster5==

*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove the chain column, all columns to the right of the z-coordinate and the TER statements.
*generate the files <tt>.only_spheres</tt> and – in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> – <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> .
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand.
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n — you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to reprepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch — it is likely that there are some blanks or hidden characters that are causing the problems.
*check the files <tt>stdout</tt> and <tt>stderr</tt> after the run for potential mistakes and error messages. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms.
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].

==Removing and modifying files==

*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs): <tt>rm -f PDBPARM chem.* distmap.box distmap distmap.log rec+sph.phi solvmap tart.txt OUT*</tt>
*modify <tt>rec.crg</tt>:
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK.
**remove all TER statements that might have been added.
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN.
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX.
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble.
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively ⇒ do not tart any residues in this file!

==Running <tt>[[chemgrid]]</tt> ==

*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct charges of all residues.
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3rd and 4th column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (≤ -200).
*Another sign of a problem with atomic radii are any 'WARNING' issued in OUTPARM
*if one has to run <tt>chemgrid</tt> again, remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.

==Running <tt>distmap</tt> ==

* the default is to run <tt>distmap</tt> on <tt>rec.crg</tt>. If you modified this file, rerun by simply typing <tt>distmap</tt>.
* AH: cp <tt>rec.crg</tt> to <tt>rec-dist.crg</tt> and remove the Zn atoms in the latter file (otherwise there will be lots of bumping ligands). Edit <tt>INDIST</tt> to update the filename.
*run <tt>distmap</tt>

==Tarting the protein==

*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.
* take care that the format of the <tt>.prot2</tt> file is consistent with the format in the <tt>amb.crg.oxt</tt> file, e.g., that there is no leading space before an atom name etc.
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt> , where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt> .
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt>
*tart the residues that are in contact with a crystallographic ligand, if any.
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.

==Modifying the Delphi spheres==

*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres).
*delete the spheres that are too close to the solvent.
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues.
*a good number for DelPhi spheres is 120.
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi!

==Modifying the Matching spheres==

*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms.
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.
*a good number for matching spheres is 50-60.
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]].
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres.
*run <tt>cat $mud/header.sph match2.sph</tt> .

==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==

*if you changed rec+sph.crg above, you need to run Delphi
*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate.
*run <tt>delphi.com > delphi.log</tt> and check the output.
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.

==Running <tt>[[solvmap]]</tt> ==

*if you changed rec.crg or the box above, you need to run solvmap
*check that all atoms are present in <tt>rec.crg</tt> and run <tt>solvmap</tt> .

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Preparing the protein

2011-06-08T06:02:26Z

Mysinger: /* Modifying the spheres */

=Preparing the protein=

Items which are prefixed with 'AH' are relevant for docking [[HEI]]s to amidohydrolases and can safely be ignored for most metal-free proteins.

==Modifying the PDB file==

*prepare <tt>rec.pdb</tt> by removing all lines that do not commence with 'ATOM', the chain column, all columns to the right of the z-coordinate and the TER statements.
*treat all selenomethionines (MSE) as methionines (MET) by replacing the selenium atom (SE¤) with sulphur (¤SD). Be careful about the correct alignment!
*atom enumeration does not matter, so don't bother to renumber after any of the following steps. Unique numbers are a good idea, presumably.
*select the protonation states of HIS residues to be either δ- (rename residue to HID), ε- (rename residue to HIE) or doubly protonated (rename residue to HIP). HIS on the surface should be HIP. HIS residues coordinating the metal ions should have their protons pointing away from the ions. Base your decision on the immediate environment of the HIS residue: are there potential hydrogen bonds that can be formed?; are there charged residues close by?; would a certain protonation lead to clashes with other residues?; etc.
*AH: the carboxylated LYS of subtype I is CYK, but this is not tolerated by <tt>startdockblaster5</tt> , so store and delete the 3 surplus atoms and call the residue LYS.
*AH: the more buried metal ion is ZB (charge 1.4), the other one ZA (charge 1.3). Atom names are right-aligned!

==Running startdockblaster5==

*generate the file <tt>xtal-lig.pdb</tt> , which should only contain atoms of the MMFF. Prepare it in the same way as above: remove the chain column, all columns to the right of the z-coordinate and the TER statements.
*generate the files <tt>.only_spheres</tt> and – in case you would like the matching spheres to be based on the heavy atoms in <tt>xtal-lig.pdb</tt> – <tt>.useligsph</tt> and write `on' to the latter. Be careful not to add blank lines at the end, this will not be understood by <tt>makespheres2.pl</tt> . In any case, the entry in <tt>.useligsph</tt> will be ignored by <tt>makespheres1.pl</tt> .
*on sgehead (or, as of [[dock67]], on any machine), run <tt>startdockblaster5</tt> to set up the data structure and copy all relevant files. It is a good idea to use csh and to <tt>source .login</tt> beforehand.
*if <tt>startdockblaster5</tt> doesn't finish for any obvious reason and with no clear error message, or <tt>rec.crg</tt> has very funny hydrogen placements, make sure that you have no non-printing characters in <tt>rec.pdb</tt> or <tt>xtal-lig.pdb</tt>. Do that by running your file through <tt>pc2unix rec.pdb</tt>. Check that your file is clean by looking at it with <tt>od -c rec.pdb | less </tt>. The only character with a backslash should be \n — you should see no \t, \r, etc. If this doesn't solve the problem, your best bet is to reprepare <tt>rec.pdb</tt> and <tt>xtal-lig.pdb</tt> from scratch — it is likely that there are some blanks or hidden characters that are causing the problems.
*check the files <tt>stdout</tt> and <tt>stderr</tt> after the run for potential mistakes and error messages. Furthermore, verify that <tt>rec.crg</tt> still contains ''all'' atoms.
*if you do not want to do anything special with the protein, like tarting some residues or modifying the spheres, go directly to chapter [[Running DOCK|3]].

==Removing and modifying files==

*go to <tt>./grids</tt> and remove the surplus files from this directory (some would cause error messages from the subsequent programs): <tt>rm -f PDBPARM chem.* distmap.box distmap distmap.log rec+sph.phi solvmap tart.txt OUT*</tt>
*modify <tt>rec.crg</tt>:
**AH: CYK: put the three missing atoms, delete the surplus hydrogens specific for LYS and rename the carboxylated lysine residue CYK.
**remove all TER statements that might have been added.
**AH: set the atom names of the metal ions to ZA and ZB and the residue name to ZN.
**take care of disulfide bonds. Remove the thiol hydrogens (if they have been added) and change the residue name from CYS to CYX.
*look at the <tt>box</tt> and maybe move it, so that the ligands won't stick out. Modify the 'center' and 'coordinates' statement in the preamble.
*all residues and atoms have to be listed in <tt>prot.table.ambcrg.ambH</tt> and <tt>vdw.parms.amb.mindock</tt>, respectively ⇒ do not tart any residues in this file!

==Running <tt>[[chemgrid]]</tt> ==

*run <tt>chemgrid</tt> and check <tt>OUTPARM</tt> for the correct charges of all residues.
*grep for <tt>0.000</tt> in <tt>PDBPARM</tt>: if any atom has this value in the 3rd and 4th column, it has not been recognized by <tt>chemgrid</tt> (because it is not listed in <tt>prot.table.ambcrg.ambH</tt>) and is thus ''ignored'' in the van der Waals-maps. There will be no other errors, the docking will finish showing some "bumping" ligands which have extremely favorable energies (≤ -200).
*Another sign of a problem with atomic radii are any 'WARNING' issued in OUTPARM
*if one has to run <tt>chemgrid</tt> again, remove <tt>PDBPARM OUTPARM OUTCHEM</tt> and <tt>chem.*</tt>.

==Running <tt>distmap</tt> ==

* the default is to run <tt>distmap</tt> on <tt>rec.crg</tt>. If you modified this file, rerun by simply typing <tt>distmap</tt>.
* AH: cp <tt>rec.crg</tt> to <tt>rec-dist.crg</tt> and remove the Zn atoms in the latter file (otherwise there will be lots of bumping ligands). Edit <tt>INDIST</tt> to update the filename.
*run <tt>distmap</tt>

==Tarting the protein==

*cp <tt>rec.crg</tt> to <tt>rec+sph.crg</tt> and continue with the latter file.
* tarted residues can be found in <tt>$DOCK_BASE/scripts/grids</tt>, they are the files with the extension <tt>prot2</tt>.
* take care that the format of the <tt>.prot2</tt> file is consistent with the format in the <tt>amb.crg.oxt</tt> file, e.g., that there is no leading space before an atom name etc.
*AH: select the appropriate version of <tt>amb.crg.oxt</tt> depending on the subtype. Files are called <tt>amb.crg.oxt.N</tt> , where <tt>N</tt> can be <tt>I, III</tt> or <tt>VI</tt> .
*AH: edit the residues in the binding site (i.e., all residues complexing the metal ions in the binding site), so that their names conform to the names of the modified residues in <tt>amb.crg.oxt.N</tt>
*tart the residues that are in contact with a crystallographic ligand, if any.
*AH: check that ZA and ZB, respectively (left-aligned in the atom column), have corresponding entries in <tt>amb.crg.oxt.N</tt> and <tt>vdw.siz</tt>.

==Modifying the Delphi spheres==

*load <tt>match1.sph.pdb</tt> (i.e., the DelPhi spheres).
*delete the spheres that are too close to the solvent.
*(AH:) add spheres so that there is one sphere ''between'' the metals, several spheres ''around'' the metals and some spheres close to polar residues.
*a good number for DelPhi spheres is 120.
*append the spheres to the end of <tt>rec.crg</tt> to make <tt>rec+sph.crg</tt> and put a TER statement after each sphere. Don't use tabs for whitespace, can cause problems with DelPhi!

==Modifying the Matching spheres==

*load <tt>match2.sph.pdb</tt> for sparse initial spheres or <tt>match3.sph.pdb</tt> denser spheres.
*If you selected <tt>.useligsph</tt> be careful not to move any spheres based on the ligand atoms.
*(AH:) put at least one sphere between the metals and increase the sampling in the region around the metal ions by putting some spheres there.
*a good number for matching spheres is 50-60.
*run <tt>pdbtosph matchN.sph.pdb mysph.sph</tt> to generate the files that will be read by [[DOCK]].
*if color matching is desired, run <tt>colorspheres.pl sph/match2.sph</tt> in the parent directory of the docking run (i.e., <tt>..</tt> to <tt>sph</tt> ) to put some color on your spheres.
*run <tt>cat $mud/header.sph match2.sph</tt> .

==Running <tt>[http://bcr.musc.edu/manuals/delphi.htm DelPhi]</tt> ==

*if necessary, modify <tt>delphi.com</tt> so that all the paths and file names are appropriate.
*run <tt>delphi.com > delphi.log</tt> and check the output.
*any 'WARNING' in the log is an indication that some atomic charges might not be correct.

==Running <tt>[[solvmap]]</tt> ==

*check that all atoms are present in <tt>rec.crg</tt> and run <tt>solvmap</tt> .
*after the run, make sure that the file <tt>solvmap</tt> contains '''no''' blank lines.

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Analysing the results

2011-05-28T03:47:18Z

Mysinger: /* Atomic contributions to the desolvation */

=Some analyses that can be performed=

See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.

==Combining the results of all subdirectories==

*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.

==Getting individual atom contributions with scoreopt_so==

===First you need an <tt>.eel1</tt> file to be scored===

=====For the xtal-lig.mol2 in its crystallographic pose=====

New way that outputs your.eel1 starting from your.pdb directly
*run '<tt>$mud/to_eel1.csh your.pdb</tt>'.

If that fails, use the old way to convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.

=====For molecules that have already been docked=====

*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.

===Overall molecular score compiled from all scoreopt_so options===

For default grids
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids'</tt>
Or for custom grids, used below to run SEV-based desolvation grids
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids rec+sph.phi chem solvmap_sev'</tt>
The summary for the whole molecule is output to your.eel1.scores in combine.scores format

===Atomic contributions to the coulombic energy===

In your.eel1.delphi from the wrapper
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 × 10) of the atom, respectively.
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu.
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>.
*enter the name of the output file, e.g. <tt>ligand.delphi</tt> .

===Atomic contributions to the van der Waals energy===

In your.eel1.vdw from the wrapper
*be adequately [http://www.merriam-webster.com/dictionary/scared scared].
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu.
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> .
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> .
*answer the question about interpolation with 'yes'.
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> .

===Atomic contributions to the desolvation===

In your.eel1.solv from the wrapper
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 × 10) of the atom, respectively.
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu.
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> or <tt>grids/solvmap_sev</tt>.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.solv</tt> .

==Other small useful things==
===Obtaining the net charge of a docked molecule===

*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Analysing the results

2011-05-28T03:46:45Z

Mysinger: /* Atomic contributions to the coulombic energy */

=Some analyses that can be performed=

See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.

==Combining the results of all subdirectories==

*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.

==Getting individual atom contributions with scoreopt_so==

===First you need an <tt>.eel1</tt> file to be scored===

=====For the xtal-lig.mol2 in its crystallographic pose=====

New way that outputs your.eel1 starting from your.pdb directly
*run '<tt>$mud/to_eel1.csh your.pdb</tt>'.

If that fails, use the old way to convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.

=====For molecules that have already been docked=====

*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.

===Overall molecular score compiled from all scoreopt_so options===

For default grids
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids'</tt>
Or for custom grids, used below to run SEV-based desolvation grids
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids rec+sph.phi chem solvmap_sev'</tt>
The summary for the whole molecule is output to your.eel1.scores in combine.scores format

===Atomic contributions to the coulombic energy===

In your.eel1.delphi from the wrapper
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 × 10) of the atom, respectively.
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu.
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>.
*enter the name of the output file, e.g. <tt>ligand.delphi</tt> .

===Atomic contributions to the van der Waals energy===

In your.eel1.vdw from the wrapper
*be adequately [http://www.merriam-webster.com/dictionary/scared scared].
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu.
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> .
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> .
*answer the question about interpolation with 'yes'.
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> .

===Atomic contributions to the desolvation===

In your.eel1.solv from the wrapper
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 × 10) of the atom, respectively.
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu.
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> .
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.solv</tt> .

==Other small useful things==
===Obtaining the net charge of a docked molecule===

*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Analysing the results

2011-05-28T03:46:00Z

Mysinger: Update to the modern way to scoreopt

=Some analyses that can be performed=

See [[MUD - Michael's Utilities for Docking]] for a lot of tools to help with analyzing DOCK runs.

==Combining the results of all subdirectories==

*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using '<tt>$mud/topdock.py -o top500.pdb</tt>', which you can read into ViewDOCK in chimera as a DOCK 4, 5, or 6 style file.
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <tt>$mud/topdock.py -e</tt>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.

==Getting individual atom contributions with scoreopt_so==

===First you need an <tt>.eel1</tt> file to be scored===

=====For the xtal-lig.mol2 in its crystallographic pose=====

New way that outputs your.eel1 starting from your.pdb directly
*run '<tt>$mud/to_eel1.csh your.pdb</tt>'.

If that fails, use the old way to convert an input <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file
*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.
*run '<tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>'.

=====For molecules that have already been docked=====

*run '<tt>$mud/topdock.py -e -o top500.eel1' to generate an .eel1 containing the top 500 docked molecules.
*or unzip the dock output '<tt>gunzip -c test.eel1.gz > test.eel1</tt>'
*or to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to '<tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>'.

===Overall molecular score compiled from all scoreopt_so options===

For default grids
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids'</tt>
Or for custom grids, used below to run SEV-based desolvation grids
*run <tt>'$mud/doscoreopt.csh your.eel1 ../path/to/grids rec+sph.phi chem solvmap_sev'</tt>
The summary for the whole molecule is output to your.eel1.scores in combine.scores format

===Atomic contributions to the coulombic energy===

In your.eel1.delphi from the wrapper
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 × 10) of the atom, respectively.
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '2' in the first menu.
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> or <tt>top500.eel1</tt>.
*enter the name of the output file, e.g. <tt>ligand.elec</tt> .

===Atomic contributions to the van der Waals energy===

In your.eel1.vdw from the wrapper
*be adequately [http://www.merriam-webster.com/dictionary/scared scared].
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '3' in the first menu.
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> .
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> .
*answer the question about interpolation with 'yes'.
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> .

===Atomic contributions to the desolvation===

In your.eel1.solv from the wrapper
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 × 10) of the atom, respectively.
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.
Or to generate this data yourself
*start <tt>scoreopt_so</tt> and choose option '4' in the first menu.
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> .
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.solv</tt> .

==Other small useful things==
===Obtaining the net charge of a docked molecule===

*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file.

[[Category:Manual_DOCK]]
[[Category:Tutorials]]

Chembl2pdb

2011-03-18T21:43:09Z

Mysinger: /* GENERATION PROCEDURE */

== CURRENT DATA ==

__ Updated 02/24/2011 __

The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:

'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''

There are 3 subfolders:

- '''uniprot''': categorized by target uniprot id

- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding activity data from ChEMBL (actives.smi)

- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding actives from chEMBL(actives.smi)

In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.

eg: How many UniProt targets have ChEMBL ligands?
% cd uniprot
% wc -l uniprot

eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?
% cd bypdb_ligand/
% ls -d ????| wc -l

eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?
% cd pdb_other/
% ls -d ???? | wc -l

== GENERATION PROCEDURE ==

In future, if you want to generate the data again, you need to do the following:

*Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release)
*Step II.: Make a new directory, run the script pointing to the new sql database name, and wait a day or two for it to finish
mkdir chembl10
cd chembl10
/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10

[[Category:Tutorials]]

Chembl2pdb

2011-03-18T21:42:29Z

Mysinger: /* GENERATION PROCEDURE */

== CURRENT DATA ==

__ Updated 02/24/2011 __

The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:

'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''

There are 3 subfolders:

- '''uniprot''': categorized by target uniprot id

- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding activity data from ChEMBL (actives.smi)

- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding actives from chEMBL(actives.smi)

In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.

eg: How many UniProt targets have ChEMBL ligands?
% cd uniprot
% wc -l uniprot

eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?
% cd bypdb_ligand/
% ls -d ????| wc -l

eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?
% cd pdb_other/
% ls -d ???? | wc -l

== GENERATION PROCEDURE ==

In future, if you want to generate the data again, you need to do the following:

*Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release)
*Step II.: Make a new directory, run the script, and wait a day or two for it to finish
mkdir chembl10
cd chembl10
/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10

[[Category:Tutorials]]

Chembl2pdb

2011-03-18T21:41:50Z

Mysinger: /* GENERATION PROCEDURE */

== CURRENT DATA ==

__ Updated 02/24/2011 __

The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:

'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''

There are 3 subfolders:

- '''uniprot''': categorized by target uniprot id

- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding activity data from ChEMBL (actives.smi)

- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding actives from chEMBL(actives.smi)

In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.

eg: How many UniProt targets have ChEMBL ligands?
% cd uniprot
% wc -l uniprot

eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?
% cd bypdb_ligand/
% ls -d ????| wc -l

eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?
% cd pdb_other/
% ls -d ???? | wc -l

== GENERATION PROCEDURE ==

In future, if you want to generate the data again, you need to do the following:

*Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release)
*Step II.: Make a new directory, run the script, and wait a day or two for it to finish
```mkdir chembl10```
```cd chembl10```
```/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10```

[[Category:Tutorials]]

Chembl2pdb

2011-03-18T21:41:30Z

Mysinger: /* GENERATION PROCEDURE */

== CURRENT DATA ==

__ Updated 02/24/2011 __

The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:

'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''

There are 3 subfolders:

- '''uniprot''': categorized by target uniprot id

- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding activity data from ChEMBL (actives.smi)

- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding actives from chEMBL(actives.smi)

In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.

eg: How many UniProt targets have ChEMBL ligands?
% cd uniprot
% wc -l uniprot

eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?
% cd bypdb_ligand/
% ls -d ????| wc -l

eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?
% cd pdb_other/
% ls -d ???? | wc -l

== GENERATION PROCEDURE ==

In future, if you want to generate the data again, you need to do the following:

Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release)
Step II.: Make a new directory, run the script, and wait a day or two for it to finish
```mkdir chembl10```
```cd chembl10```
```/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10```

[[Category:Tutorials]]

Chembl2pdb

2011-03-18T21:41:04Z

Mysinger: New simplified generation procedure

== CURRENT DATA ==

__ Updated 02/24/2011 __

The current data relating the ChEMBL09 protein targets to structures on the PDB can be found at:

'''/raid3/people/mysinger/pxc/pdb_to_chembl/chembl09'''

There are 3 subfolders:

- '''uniprot''': categorized by target uniprot id

- '''pdb_ligand''': all pdb codes that have a bound ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding activity data from ChEMBL (actives.smi)

- '''pdb_other''': all pdb codes that do NOT have a bound crystal ligand (as defined by be_blasti.csh script from DOCKBlaster)
with the corresponding actives from chEMBL(actives.smi)

In order to get some statistics: how many pdb codes, how many targets have ChEMBL ligands, you can simply count the number of subfolders in each "byXXX" folder.

eg: How many UniProt targets have ChEMBL ligands?
% cd uniprot
% wc -l uniprot

eg: How many pdb structures have ChEMBL actives and a bound crystal ligand?
% cd bypdb_ligand/
% ls -d ????| wc -l

eg: How many pdb structures have ChEMBL actives BUT WITHOUT a bound crystal ligand?
% cd pdb_other/
% ls -d ???? | wc -l

== GENERATION PROCEDURE ==

In future, if you want to generate the data again, you need to do the following:

Step I: Load new ChEMBL SQL database into zincdb1 ( do this only if there is a new ChEMBL release)
Step II.: Make a new directory, run the script, and wait a day or two for it to finish
```mkdir chembl10```
```cd chembl10```
```/raid3/people/mysinger/pxc/pdb_to_chembl/generate_chembl_map.csh chembl10```

[[Category:Tutorials]]

MUD - Michael's Utilities for Docking

2010-01-13T00:39:31Z

Mysinger: /* Computing Enrichments */

==What's in MUD?==

*Tools to start, check, and restart dock jobs
*Tools to combine, enrich, plot, and view docking results

==Setting up MUD==

*For convenience, point a shell variable to the base mud directory to save typing
set mud=~mysinger/code/mud/trunk
*If you use MUD a lot, you can add this to your ~/.login
*Then simply run commands like this:
$mud/submit.csh
$mud/check.py -h
*Use -h or --help to get full help information for the .py (python) scripts
*The .csh scripts will automatically print usage information if mis-used
*The scripts automatically use their invocation path to find other scripts and libraries they depend on.

==Job Control==

===Main Workflow===

For a quick summary of what to do first see [[SGE_Cluster_Docking]]. For a detailed look at how to get the details right see [[How to run and analyze a DOCK run by hand]].

*Submit a parallel job to the cluser
$mud/submit.csh
Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.
*Check parallel job status
$mud/check.py
Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.
*Restart all failed subjobs
$mud/restart.py
This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.

===Specialized Commands===
*Submit job to the local machine
$mud/sublocal.csh
*Submit a single directory to the cluster
qsub $mud/runsge.csh
*Submit a single directory to the local machine
$mud/runsubdir.csh
*Remove docking output leaving only input - will DELETE even completed jobs
$mud/clean.py
*Restart single directory
$mud/restartdir.py

==Job Analysis==

*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules.

To achieve consistency, you have two options:
1. Write coordinates for all molecules (what I use)
In INDOCK, set number_save to 50000 or something high enough to capture all dockable hierarchies. DOCK output is now gzipped so this is cheaper in disk space than it used to be.
2. Do not check for broken molecules
Use the -b option when running combine.py

===Combining Parallel Jobs===
*Merge all parallel jobs into a single set of unique scores.
$mud/combine.py
This combine carefully accounts for all docked molecules, for more informative enrichment plots.

*Options:
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.

*Creates:
#combine.scores - fully processed scores, using the best one for each id
#combine.raw - contains all scores as scrapped from DOCK output
#combine.broken - broken molecules and the reason they failed
#combine.zeroes - important sanity check

format of combine.scores:
<id> <shape> <elect> <VdW> <polar solv> <apolar solv> <total> <subdir>

The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results.

===Computing Enrichments===
*Compute enrichment starting from the combined scores.
$mud/enrich.py -s -l LIGAND_FILE
< or >
$mud/enrich.py -l LIGAND_FILE -d DECOY_FILE
Generates both enrichment and roc curves, both for the ligands against all molecules and for the ligands versus just the decoys. It will try to run combine if it has not been run yet, but will do so only with defaults for every option.

*Input:
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.

The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.

*Options:
Use -s or --skip-own-curves to skip consideration of decoys and thus generation of _own curves. Use -f to force combine to run again.


*Creates:
#enrich.txt - Enrichment curve for ligands versus all molecules
#roc.txt - ROC curve for ligands versus all molecules
#enrich_own.txt - Enrichment curve for ligands versus only the decoys
#roc_own.txt - ROC curve for ligands versus only the decoys
_own files are not generate is the -s option is used.

format for output files:
#AUC 50.00 LogAUC 0.00
<x> <y>
<x> <y>
...
AUC is area under the curve and the random expectation value is 50%. [[LogAUC]] is the area between the log curve and the log random curve, so the random expectation value is 0%. <y> is always "% ligands found", and <x> is either "% database searched" for enrichment plots or "% non-ligands found" for ROC plots.

===Plotting Enrichments===
Easily plot enrichment and roc curves from one or more jobs.
$mud/plots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC
< or >
$mud/plots.py -i .
Generates plots with one curve for each -i input_directory.

*Options:
Use -s or --skip-own-curves to skip _own curves, especially if they don't exist because enrich.py was run with -s. You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory. Use -n to get normal instead of semi-log plots (and AUC in place of LogAUC).

*Creates:
#[title_]enrich.png
#[title_]roc.png
#[title_]enrich_own.png
#[title_]roc_own.png

The various graphs have the same meaning as their respective cures from [[#Computing Enrichments]]. [title_] is optional and exists when a custom title is given with the -t option.

===Computing Energy Histograms===
*Compute energy distributions starting from the combined scores.
$mud/energies.py -s -l LIGAND_FILE
< or >
$mud/energies.py -l LIGAND_FILE -d DECOY_FILE
Generates the energy distributions for the ligands, decoys, and all the other molecules.

*Input:
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.

The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.

*Options:
Use -s or --skip-own-curves to skip consideration of decoys.

*Creates:
#counts.txt - Energy distributions

format for output:
number_of_sections number_of_bins min_energy_threshold max_energy_threshold
##### section_name
bin_upper_edge1 count_below_edge1
...
bin_upper_edgeN count_below_edgeN
ABOVE count_above_last_edge
The sections are for ligands, decoys (optional), and others. The bins and counts define the energy histogram. The bins are finely spaced here in order to have more resolution when combine with other runs, whose energy ranges may be different.

===Plotting Energy Histograms===
Easily plot energy histograms from one or more jobs.
$mud/eplots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC
< or >
$mud/eplots.py -i .
Generates plots with energy distributions for each -i input_directory.

*Options:
You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory.

*Creates:
#[title_]counts.png

===Visualizing Molecule by Molecule Results===
Create a DOCK 4,5,6 type pdb file for use in Chimera's ViewDOCK.
$mud/topdock.py -o topdock.pdb

*Options:
Use -o to specify an output file besides stdout. Use -t NUMBER to get whatever number of top scoring molecules.

→ Back to [[Tutorials]]
[[Category:Tutorials]]

LogAUC

2010-01-13T00:37:18Z

Mysinger:

==What is LogAUC?==

LogAUC is a metric to evaluate virtual screening performance that has many of the same advantages as area under the curve (AUC), but is based on a plot where the x-axis is semilog in order to focus on early enrichment.

==Motivation==

When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number1. While AUC can be formulated alternate ways2,3, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.

==Definition==

Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply <math>logAUC</math>, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:

<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math>

==Discussion==

From similar reasoning based on semilog ROC plots, Clark and Webster-Clark construct the pROC AUC metric2, which is similar to the numerator of logAUC except that the integration is done over horizontal bars instead of vertical trapezoids. The advantage of constructing logAUC as a fraction over the ideal area is that the choice of base for the logarithm is irrelevant, because changing base simply results in a constant that cancels between numerator and denominator. Also, by explicitly defining the area of interest using λ and integrating vertically, we are able to avoid the singularity at <math>x_i=0</math> encountered in pROC. More importantly, the fixed integration area means we can more directly compare <math>logAUC_\lambda</math> values across databases of different sizes and across targets with different ratios of actives to inactives. The final advantage of logAUC is that if you are used to looking at semilog ROC plots plotted from λ to 1, and understand that logAUC is just the percentage of the total area below the curve, then you can at some point gain the same intuitive feel as AUC has for linear ROC plots. In a semilog ROC plot the random line occupies only a sliver of the total area, and indeed its logAUC is just 14.462%. In order to more easily compare a given logAUC to this random value, we instead report the “adjusted logAUC” as the calculated value minus 14.462%, so that positive values mean overall enrichments better than random.

<math>Adjusted~LogAUC=LogAUC_{0.001}-0.14462</math>

==References==
## Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55.
## Clark, R. D.; Webster-Clark, D. J., Managing bias in ROC curves. J Comput Aided Mol Des 2008, 22, (3-4), 141-6.
## Truchon, J. F.; Bayly, C. I., Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model 2007, 47, (2), 488-508.

==Citation==
Michael Mysinger, Brian Shoichet. "Rapid Context-Dependent Ligand Desolvation in Molecular Docking". 2010. (in preparation for J Chem Inf Model)

LogAUC

2010-01-13T00:32:03Z

Mysinger:

==What is LogAUC?==

LogAUC is a metric to evaluate virtual screening performance that has many of the same advantages as area under the curve (AUC), but is based on a plot where the x-axis is semilog in order to focus on early enrichment.

==Motivation==

When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number1. While AUC can be formulated alternate ways2,3, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.

==Definition==

Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply <math>logAUC</math>, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:

<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math>

==Discussion==

From similar reasoning based on semilog ROC plots, Clark and Webster-Clark construct the pROC AUC metric2, which is similar to the numerator of logAUC except that the integration is done over horizontal bars instead of vertical trapezoids. The advantage of constructing logAUC as a fraction over the ideal area is that the choice of base for the logarithm is irrelevant, because changing base simply results in a constant that cancels between numerator and denominator. Also, by explicitly defining the area of interest using λ and integrating vertically, we are able to avoid the singularity at <math>x_i=0</math> encountered in pROC. More importantly, the fixed integration area means we can more directly compare <math>logAUC_\lambda</math> values across databases of different sizes and across targets with different ratios of actives to inactives. The final advantage of logAUC is that if you are used to looking at semilog ROC plots plotted from λ to 1, and understand that logAUC is just the percentage of the total area below the curve, then you can at some point gain the same intuitive feel as AUC has for linear ROC plots. In a semilog ROC plot the random line occupies only a sliver of the total area, and indeed its logAUC is just 14.462%. In order to more easily compare a given logAUC to this random value, we instead report the “adjusted logAUC” as the calculated value minus 14.462%, so that positive values mean overall enrichments better than random.

<math>Adjusted~LogAUC=LogAUC_{0.001}-.14462</math>

==References==
## Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55.
## Clark, R. D.; Webster-Clark, D. J., Managing bias in ROC curves. J Comput Aided Mol Des 2008, 22, (3-4), 141-6.
## Truchon, J. F.; Bayly, C. I., Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model 2007, 47, (2), 488-508.

LogAUC

2010-01-13T00:15:46Z

Mysinger:

==What is LogAUC?==

LogAUC is a metric to evaluate virtual screening performance that has some nice characteristics. It is intuitive to use

==Motivation==

When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number1. While AUC can be formulated alternate ways2,3, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.

==Definition==

Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply logAUC, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:

<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math>

==References==
1. Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55.
2.

LogAUC

2010-01-13T00:00:40Z

Mysinger:

2009-12-05T02:36:05Z

Mysinger: /* Combining the results of all subdirectories */

=Some analyses that can be performed=
==Combining the results of all subdirectories==

*in the subdirectory that contains all the individual directories for each chunk of the library, run <tt>$mud/combine.py</tt>. Then generate a file containing the top 500 molecules using <tt>$mud/topdock.py -o top500.pdb</tt>, which you can read into ViewDOCK in chimera as a DOCK 4,5, or 6 style file.
*to create an <tt>.eel1</tt> file containing the top 500 molecules just run <<tt>>$mud/topdock.py -e<</tt>>. If one wants to create an <tt>.eel1</tt> file for a different subset of the molecules, first create the list of molecule names plus their energies (on one line) and then feed it to <tt>getxpdb.pl name_energy.list < FF.test.eel1 > subset_name.eel1</tt>.

==Getting individual atom contributions with scoreopt_so==

===Converting a <tt>[http://www.tripos.com/index.php?family=modules,SimplePage,,,&page=sup_mol2&s=0 .mol2]</tt> file into an <tt>.eel1</tt> file===

*run <tt>amsol</tt> as described [[Preparing_the_ligand#Running amsol|here]] to calculate atomic solvation energies.
*run <tt>file2file.py -s path/to/amsol.solv path/to/amsol.nmol2 ligand.eel1</tt>.

===Individual contributions to the coulombic energy===

*start <tt>scoreopt_so</tt> and choose option '2' in the first menu.
*enter the name of the DelPhi potential file, presumably <tt>grids/rec+sph.phi</tt>.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.elec</tt> .
*in every ATOM line, columns 9, 10 and 11 are the partial charge, the electrostatic field and the energy in kT (i.e., 9 × 10) of the atom, respectively.
*the DelPhi electrostatic score is the sum over the entries in column 11 times 0.5924 (conversion from kT to kcal/mol) and can be compared to the elect column in OUTDOCK.

===Individual contributions to the van der Waals energy===

*start <tt>scoreopt_so</tt> and choose option '3' in the first menu.
*enter the prefix name of grids for ff scoring as a full path, i.e., <tt>grids/chem</tt> .
*enter the name of the van der Waals parameter file, presumably <tt>grids/vdw.parms.amb.mindock</tt> .
*answer the question about interpolation with 'yes'.
*enter a sufficiently large number as maximal van der Waals energy, e.g. 10000.
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.vdw</tt> .
*be adequately [http://www.merriam-webster.com/dictionary/scared scared].
*the van der Waals interaction energy is calculated as <math>{vdW}_{(r)}=\frac{A}{r^{12}}-\frac{B}{r^6}=a-b</math>. In every ATOM line, columns 9, 10 and 11 are <math>a</math>, <math>b</math> and <math>a-b</math>, respectively.
* DO NOT use the interaction energy, as we only use the vdw component now. Instead, use the vdwsum to compare with the vdW column in OUTDOCK.

===Individual contributions to the desolvation===

*start <tt>scoreopt_so</tt> and choose option '4' in the first menu.
*enter the name of the grid for partial desolvation, presumably <tt>grids/solvmap</tt> .
*enter the name of the ligand file, i.e., <tt>ligand.eel1</tt> .
*enter the name of the output file, e.g. <tt>ligand.solv</tt> .
*in every ATOM line, columns 9, 10, and 11 are the total atomic solvation energy (polar + apolar), percentage desolvation, and atomic desolvation energy (i.e. - 9 × 10) of the atom, respectively.
*the total desolvation is the sum over the entries in column 11 and can be compared to the sum of the polsol and apolsol columns in OUTDOCK.

==Other small useful things==
===Obtaining the net charge of a docked molecule===

*take the output <tt>.eel1</tt> file and run <tt>molcharge_pdb.pl < output.eel1</tt>. This will output the sequential number of the molecule, the [http://zinc.docking.org/ ZINC] identifier, the total charge and the number of atoms for every molecule in the file. This script is called by <tt>combine10.csh</tt> and the output is called <tt>FF.new.chg</tt> (cf. section [[#Combining the results of all subdirectories|5.1]]).

[[Category:Manual_DOCK]]

Running DOCK

2009-12-05T02:28:43Z

Mysinger: /* Running DOCK */

MUD - Michael's Utilities for Docking

2009-12-05T01:59:28Z

Mysinger:

==What's in MUD?==

*Tools to start, check, and restart dock jobs
*Tools to combine, enrich, plot, and view docking results

==Setting up MUD==

*For convenience, point a shell variable to the base mud directory to save typing
set mud=~mysinger/code/mud/trunk
*If you use MUD a lot, you can add this to your ~/.login
*Then simply run commands like this:
$mud/submit.csh
$mud/check.py -h
*Use -h or --help to get full help information for the .py (python) scripts
*The .csh scripts will automatically print usage information if mis-used
*The scripts automatically use their invocation path to find other scripts and libraries they depend on.

==Job Control==

===Main Workflow===

For a quick summary of what to do first see [[SGE_Cluster_Docking]]. For a detailed look at how to get the details right see [[How to run and analyze a DOCK run by hand]].

*Submit a parallel job to the cluser
$mud/submit.csh
Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.
*Check parallel job status
$mud/check.py
Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.
*Restart all failed subjobs
$mud/restart.py
This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.

===Specialized Commands===
*Submit job to the local machine
$mud/sublocal.csh
*Submit a single directory to the cluster
qsub $mud/runsge.csh
*Submit a single directory to the local machine
$mud/runsubdir.csh
*Remove docking output leaving only input - will DELETE even completed jobs
$mud/clean.py
*Restart single directory
$mud/restartdir.py

==Job Analysis==

*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules.

To achieve consistency, you have two options:
1. Write coordinates for all molecules (what I use)
In INDOCK, set number_save to 50000 or something high enough to capture all dockable hierarchies. DOCK output is now gzipped so this is cheaper in disk space than it used to be.
2. Do not check for broken molecules
Use the -b option when running combine.py

===Combining Parallel Jobs===
*Merge all parallel jobs into a single set of unique scores.
$mud/combine.py
This combine carefully accounts for all docked molecules, for more informative enrichment plots.

*Options:
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.

*Creates:
#combine.scores - fully processed scores, using the best one for each id
#combine.raw - contains all scores as scrapped from DOCK output
#combine.broken - broken molecules and the reason they failed
#combine.zeroes - important sanity check

format of combine.scores:
<id> <shape> <elect> <VdW> <polar solv> <apolar solv> <total> <subdir>

The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results.

===Computing Enrichments===
*Compute enrichment starting from the combined scores.
$mud/enrich.py -s -l LIGAND_FILE
< or >
$mud/enrich.py -l LIGAND_FILE -d DECOY_FILE
Generates both enrichment and roc curves, both for the ligands against all molecules and for the ligands versus just the decoys. It will try to run combine if it has not been run yet, but will do so only with defaults for every option.

*Input:
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.

The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.

*Options:
Use -s or --skip-own-curves to skip consideration of decoys and thus generation of _own curves. Use -f to force combine to run again.


*Creates:
#enrich.txt - Enrichment curve for ligands versus all molecules
#roc.txt - ROC curve for ligands versus all molecules
#enrich_own.txt - Enrichment curve for ligands versus only the decoys
#roc_own.txt - ROC curve for ligands versus only the decoys
_own files are not generate is the -s option is used.

format for output files:
#AUC 50.00 LogAUC 0.00
<x> <y>
<x> <y>
...
AUC is area under the curve and the random expectation value is 50%. LogAUC is the area between the log curve and the log random curve, so the random expectation value is 0%. <y> is always "% ligands found", and <x> is either "% database searched" for enrichment plots or "% non-ligands found" for ROC plots.

===Plotting Enrichments===
Easily plot enrichment and roc curves from one or more jobs.
$mud/plots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC
< or >
$mud/plots.py -i .
Generates plots with one curve for each -i input_directory.

*Options:
Use -s or --skip-own-curves to skip _own curves, especially if they don't exist because enrich.py was run with -s. You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory. Use -n to get normal instead of semi-log plots (and AUC in place of LogAUC).

*Creates:
#[title_]enrich.png
#[title_]roc.png
#[title_]enrich_own.png
#[title_]roc_own.png

The various graphs have the same meaning as their respective cures from [[#Computing Enrichments]]. [title_] is optional and exists when a custom title is given with the -t option.

===Computing Energy Histograms===
*Compute energy distributions starting from the combined scores.
$mud/energies.py -s -l LIGAND_FILE
< or >
$mud/energies.py -l LIGAND_FILE -d DECOY_FILE
Generates the energy distributions for the ligands, decoys, and all the other molecules.

*Input:
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.

The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.

*Options:
Use -s or --skip-own-curves to skip consideration of decoys.

*Creates:
#counts.txt - Energy distributions

format for output:
number_of_sections number_of_bins min_energy_threshold max_energy_threshold
##### section_name
bin_upper_edge1 count_below_edge1
...
bin_upper_edgeN count_below_edgeN
ABOVE count_above_last_edge
The sections are for ligands, decoys (optional), and others. The bins and counts define the energy histogram. The bins are finely spaced here in order to have more resolution when combine with other runs, whose energy ranges may be different.

===Plotting Energy Histograms===
Easily plot energy histograms from one or more jobs.
$mud/eplots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC
< or >
$mud/eplots.py -i .
Generates plots with energy distributions for each -i input_directory.

*Options:
You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory.

*Creates:
#[title_]counts.png

===Visualizing Molecule by Molecule Results===
Create a DOCK 4,5,6 type pdb file for use in Chimera's ViewDOCK.
$mud/topdock.py -o topdock.pdb

*Options:
Use -o to specify an output file besides stdout. Use -t NUMBER to get whatever number of top scoring molecules.

→ Back to [[Tutorials]]
[[Category:Tutorials]]

SGE Cluster Docking

2009-12-05T01:58:23Z

Mysinger:

== SGE Cluster Information ==

*'sgehead.compbio.ucsf.edu' is the submit machine for the Sun Grid Engine (SGE) cluster. wilco is also authorized to submit jobs.
*There are around 250 cluster nodes providing 600 total cores to run jobs in the sge queue as of May, 2009, named like 'node-1-1' through 'node-3-36' where the first number is the rack # and the second is the slot # in that rack.

== SGE Commands ==
*sgestat: high level overview of cluster status
*qsub: submit jobs 
*qstat: check job status 
*qdel: remove jobs 
*qhost: check cluster status 
*man sge_intro: start of manpage documentation 

== Typical Docking Workflow ==

*Generate spheres and grids - See [[Using MakeDOCK]] for more information, including how to prepare the receptor and ligand
ssh sgehead.compbio.ucsf.edu # ssh to SGE submit machine
mkdir example # make docking directory
cd example # change to docking directory
cp <somedir>/rec.pdb . # copy or create rec.pdb
cp <somedir>/xtal-lig.mol2 . # copy or create xtal-lig.mol2 (or even xtal-lig.pdb)
startdockblaster5 # create spheres and grids
# Check output for WARNING messages, correct as needed

* Setting up a docking run
cp calibrate/INDOCK.1.A INDOCK # copy or create INDOCK
md4db.csh bysubset 2 100 # create directories for docking run with 100 chunks
# 2 indicates we want the fragment-like subset of ZINC (See http://zinc.docking.org/subset1)
cd run.2 # chdir into run.2 directory

* Everything else
See [[MUD - Michael's Utilities for Docking]] for how to submit, check, and analyse the docking run.

For information on which ZINC

[[Category:Internal]]
[[Category:Tutorials]]
[[Category:Cluster]]
[[Category:Unix]]

SGE Cluster Docking

2009-12-05T01:54:22Z

Mysinger: Update to modern workflow

MUD - Michael's Utilities for Docking

2009-12-05T01:44:35Z

Mysinger: Add energy histogram programs

==What's in MUD?==

*Tools to start, check, and restart dock jobs
*Tools to combine, enrich, plot, and view docking results

==Setting up MUD==

*For convenience, point a shell variable to the base mud directory to save typing
set mud=~mysinger/code/mud/trunk
*If you use MUD a lot, you can add this to your ~/.login
*Then simply run commands like this:
$mud/submit.csh
$mud/check.py -h
*Use -h or --help to get full help information for the .py (python) scripts
*The .csh scripts will automatically print usage information if mis-used
*The scripts automatically use their invocation path to find other scripts and libraries they depend on.

==Job Control==

===Main Workflow===
*Submit a parallel job to the cluser
$mud/submit.csh
Uses 'dirlist' to determine which directories to run. Similar to startdockbksX, but also indicates job submission by touching a submitted file in each directory.
*Check parallel job status
$mud/check.py
Indicates the status of unfinished (or unsubmitted) jobs. Note that it simply returns nothing if everything is finished.
*Restart all failed subjobs
$mud/restart.py
This works even if some subjobs are still running. Occasionally, however, jobs can fail with no detectable remnants. To force those jobs to restart you can use the -f option, but beware that this will also restart all subjobs that are still running.

===Specialized Commands===
*Submit job to the local machine
$mud/sublocal.csh
*Submit a single directory to the cluster
qsub $mud/runsge.csh
*Submit a single directory to the local machine
$mud/runsubdir.csh
*Remove docking output leaving only input - will DELETE even completed jobs
$mud/clean.py
*Restart single directory
$mud/restartdir.py

==Job Analysis==

*Enrichment plots are sensitive to consistent treatment and proper accounting for all docked molecules. The combine script properly accounts for all docked molecules by detecting bumped out, no matched, and timed out molecules.

To achieve consistency, you have two options:
1. Write coordinates for all molecules (what I use)
In INDOCK, set number_save to 50000 or something high enough to capture all dockable hierarchies. DOCK output is now gzipped so this is cheaper in disk space than it used to be.
2. Do not check for broken molecules
Use the -b option when running combine.py

===Combining Parallel Jobs===
*Merge all parallel jobs into a single set of unique scores.
$mud/combine.py
This combine carefully accounts for all docked molecules, for more informative enrichment plots.

*Options:
Use -b or --broken to skip finding broken molecules. Use -d or --done to indicate that all subjobs are complete, for the case where you did not submit with a MUD submission script. Use -p or --prefix if your output files are named something other than test. Use --box if your box file is not at ../../grids/box relative to your subjob directories.

*Creates:
#combine.scores - fully processed scores, using the best one for each id
#combine.raw - contains all scores as scrapped from DOCK output
#combine.broken - broken molecules and the reason they failed
#combine.zeroes - important sanity check

format of combine.scores:
<id> <shape> <elect> <VdW> <polar solv> <apolar solv> <total> <subdir>

The .zeroes file is a sanity check because it lists the number of molecules followed by the number of zeroes in each scoring column. Past experience has shown that when DOCK fails randomly and silently, it often generates a large number of zero scores. If this happens, simply re-running the job will give better results.

===Computing Enrichments===
*Compute enrichment starting from the combined scores.
$mud/enrich.py -s -l LIGAND_FILE
< or >
$mud/enrich.py -l LIGAND_FILE -d DECOY_FILE
Generates both enrichment and roc curves, both for the ligands against all molecules and for the ligands versus just the decoys. It will try to run combine if it has not been run yet, but will do so only with defaults for every option.

*Input:
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.

The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.

*Options:
Use -s or --skip-own-curves to skip consideration of decoys and thus generation of _own curves. Use -f to force combine to run again.


*Creates:
#enrich.txt - Enrichment curve for ligands versus all molecules
#roc.txt - ROC curve for ligands versus all molecules
#enrich_own.txt - Enrichment curve for ligands versus only the decoys
#roc_own.txt - ROC curve for ligands versus only the decoys
_own files are not generate is the -s option is used.

format for output files:
#AUC 50.00 LogAUC 0.00
<x> <y>
<x> <y>
...
AUC is area under the curve and the random expectation value is 50%. LogAUC is the area between the log curve and the log random curve, so the random expectation value is 0%. <y> is always "% ligands found", and <x> is either "% database searched" for enrichment plots or "% non-ligands found" for ROC plots.

===Plotting Enrichments===
Easily plot enrichment and roc curves from one or more jobs.
$mud/plots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC
< or >
$mud/plots.py -i .
Generates plots with one curve for each -i input_directory.

*Options:
Use -s or --skip-own-curves to skip _own curves, especially if they don't exist because enrich.py was run with -s. You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory. Use -n to get normal instead of semi-log plots (and AUC in place of LogAUC).

*Creates:
#[title_]enrich.png
#[title_]roc.png
#[title_]enrich_own.png
#[title_]roc_own.png

The various graphs have the same meaning as their respective cures from [[#Computing Enrichments]]. [title_] is optional and exists when a custom title is given with the -t option.

===Computing Energy Histograms===
*Compute energy distributions starting from the combined scores.
$mud/energies.py -s -l LIGAND_FILE
< or >
$mud/energies.py -l LIGAND_FILE -d DECOY_FILE
Generates the energy distributions for the ligands, decoys, and all the other molecules.

*Input:
Use -l to specify the ligand identifier file and -d to specify the decoy identifier file.

The identifier files simply contain an id for each known ligand that matched the one in the docking databases. The script is smart enough to match "ZINC12345678" to "C12345678", so either form is acceptable.

*Options:
Use -s or --skip-own-curves to skip consideration of decoys.

*Creates:
#counts.txt - Energy distributions

format for output:
number_of_sections number_of_bins min_energy_threshold max_energy_threshold
##### section_name
bin_upper_edge1 count_below_edge1
...
bin_upper_edgeN count_below_edgeN
ABOVE count_above_last_edge
The sections are for ligands, decoys (optional), and others. The bins and counts define the energy histogram. The bins are finely spaced here in order to have more resolution when combine with other runs, whose energy ranges may be different.

===Plotting Energy Histograms===
Easily plot energy histograms from one or more jobs.
$mud/eplots.py -i . -l New_Run -i ../old_run_dir -l Old_Run -t AmpC
< or >
$mud/eplots.py -i .
Generates plots with energy distributions for each -i input_directory.

*Options:
You can either label each -i INDIR with a -l LABEL, or use no -l options to get the default labels based on parent directory names. Use -t TITLE to change the plot title and filename. Use -o to specify a different output directory.

*Creates:
#[title_]counts.png

===Visualizing Molecule by Molecule Results===
Create a DOCK 4,5,6 type pdb file for use in Chimera's ViewDOCK.
$mud/topdock.py -o topdock.pdb

*Options:
Use -o to specify an output file besides stdout. Use -t NUMBER to get whatever number of top scoring molecules.

→ Back to [[Tutorials]]
[[Category:Tutorials]]

How to compile DOCK

2009-12-05T01:19:03Z

Mysinger:

This is for the Shoichet Lab local version of DOCK 3.5.54 trunk.

'''Checking out the source files'''

Commands:
csh
mkdir /where/to/put
cd /where/to/put
svn checkout file:///raid4/svn/dock
svn checkout file:///raid4/svn/libfgz

'''Compiling the program on our cluster'''

Commands:
ssh sgehead
# You should see "Enabling pgf compiler" when you login, otherwise seek help
cd /where/to/put/libfgz/trunk
make
cd ../../dock/trunk/i386
make

'''Compiling the program on the shared QB3 cluster'''

On one of the compilation nodes on the shared QB3 cluster (optint1 or optint2):

ssh optint2
cd /where/to/put/libfgz/trunk
cp Makefile Makefile.old
modify Makefile:
uncomment the following:
FC = ifort -O3
CC = icc -O3
make
cd ../../dock/trunk/i386
cp Makefile Makefile.old
modify Makefile
uncomment the following:
F77 = ifort
FFLAGS = -O3 -convert big_endian
make dock

[[Category:Tutorials]]

How to compile DOCK

2009-12-05T01:14:15Z

Mysinger: Change to subversion

'''Checking out the source files'''

Commands:
csh
mkdir /where/to/put
cd /where/to/put
svn checkout file:///raid4/svn/dock
svn checkout file:///raid4/svn/libfgz

'''Compiling the program on our cluster'''

Commands:
cd /where/to/put/libfgz/trunk
make
cd ../../dock/trunk/i386
make

'''Compiling the program on the shared QB3 cluster'''

On one of the compilation nodes on the shared QB3 cluster (optint1 or optint2):

ssh optint2
cd /where/to/put/libfgz/trunk
cp Makefile Makefile.old
modify Makefile:
uncomment the following:
FC = ifort -O3
CC = icc -O3
make
cd ../../dock/trunk/i386
cp Makefile Makefile.old
modify Makefile
uncomment the following:
F77 = ifort
FFLAGS = -O3 -convert big_endian
make dock

[[Category:Tutorials]]

SGE Cluster Docking

2009-09-25T21:15:23Z

Mysinger: /* Typical Docking Workflow */

== SGE Cluster Information ==

*'sgehead.compbio.ucsf.edu' is the submit machine for the Sun Grid Engine (SGE) cluster. wilco is also authorized to submit jobs.
*'sgemaster.compbio.ucsf.edu' is the admin machine for the SGE cluster.
*There are around 250 cluster nodes providing 600 total cores to run jobs in the sge queue as of May, 2009, named like 'node-1-1' through 'node-3-36' where the first number is the rack # and the second is the slot # in that rack.

== SGE Commands ==
*qsub: submit jobs 
*qstat: check job status 
*qdel: remove jobs 
*qhost: check cluster status 
*man sge_intro: start of manpage documentation 

== Typical Docking Workflow ==

*Generate spheres and grids - See [[Using MakeDOCK]] for more information, including how to prepare the receptor and ligand
ssh sgehead.compbio.ucsf.edu # ssh to SGE submit machine
mkdir example # make docking directory
cd example # change to docking directory
cp <somedir>/rec.pdb . # copy or create rec.pdb
cp <somedir>/xtal-lig.mol2 . # copy or create xtal-lig.mol2 (or even xtal-lig.pdb)
startdockblaster4 # create spheres and grids
# Check output for WARNING messages, correct as needed

* Submit docking run
cp calibrate/INDOCK.1.A INDOCK # copy or create INDOCK
md4db.csh bysubset 2 50 # create directories for docking run with 50 chunks
# 2 indicates we want the fragment-like subset of ZINC (See http://zinc.docking.org/subset1)
cd run.2 # chdir into run.2 directory
startdockbks3 . # submit database chunks to SGE cluster

For information on which ZINC

[[Category:Internal]]
[[Category:Cluster]]
[[Category:Unix]]

How to compile DOCK

2009-08-28T00:30:52Z

Mysinger: bugfix

'''Checking out the source files'''
* change to cshell.
* create a directory for the source files.
* change to this directory.
* set the environment variable for CVS.
* check out the dock sources.
* check out the auxilliary libraries.

As commands:
csh
mkdir /where/to/put/dock35
cd /where/to/put/dock35
setenv CVSROOT /raid1/cvs
cvs co dock
cvs co libfgz

'''Compiling the program'''

On a 64-bit machine, e.g. one of the compilation nodes on the shared QB3 cluster (optint1 or optint2):

ssh optint2
cd /where/to/put/dock35
cd libfgz/
cp Makefile Makefile.old
modify Makefile:
comment out the following:
#FC = gfortran -O3
#CC = gcc -O3
uncomment the following:
FC = ifort -O3
CC = icc -O3
make
cd ../dock/i386/
cp Makefile Makefile.old
modify Makefile
comment out the following:
#F77 = pgf77
#FFLIBS = -lc -lgcc_eh -lgfortran
#FFLAGS = -byteswapio ...
uncomment the following:
F77 = ifort
FFLAGS = -O3 -convert big_endian
make

[[Category:Tutorials]]