Protein Target Preparation Updated

From DISI
Revision as of 18:59, 10 October 2019 by Rstein (talk | contribs) (→‎Running Blastermaster)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Running Blastermaster

For default blastermaster running, you need a directory with these files:

  rec.pdb
  xtal-lig.pdb

A useful way to get these files is to run Trent's be_balsti wrapper on your PDB. If you have the file, 1YPE.pdb, in your current directory, run the following command:

  csh /mnt/nfs/home/rstein/zzz.scripts/DOCK_prep_scripts/0001.be_balsti_py.csh 1YPE

This will create a new directory "1YPE" with rec.pdb and xtal-lig.pdb files.

Before running blastermaster, consider the following:

  Are there any mutations in your structure? Consider mutating these to WT before running blastermaster.
  Are all residues built in your structure? Blastermaster will not give you an error if they are not.
  Are there multiple conformations for any residues? Running 0001.be_balsti_py.csh should take conformation A if there are multiple conformations
  Does your protein have disulfide bonds? Blastermaster should recognize these, but double check that they are renamed to "CYX" in your rec.crg.pdb file after blastermaster runs.
  Are there ions in your structure? Should they be?

If you have missing residues, Schrodinger can be used to build them:

  source /nfs/soft/schrodinger/current/env.csh
  $SCHRODINGER/utilities/prepwizard -fillsidechains -noprotassign -noepik -noimpref rec.pdb outrec.pdb

Alternatively, Chimera can be used to build them:

   /nfs/soft/chimera/current/bin/chimera --nogui --script "/mnt/nfs/home/tbalius/zzz.scripts/chimera_dockprep.py rec.pdb out_rec"

From Chimera, use the output file "out_rec_polarH.pdb"

Check that the newly built residues are in the orientation you want. If they are fine, rename the output file to "rec.pdb", and run blastermaster.

Then in that directory, run the command:

  $DOCKBASE/proteins/blastermaster/blastermaster.py --addhOptions=" -HIS -FLIPs " -v

For using thin spheres, see this wiki:

  http://wiki.docking.org/index.php/Using_thin_spheres_in_DOCK3.7

For tarting residues, see this wiki:

  http://wiki.docking.org/index.php/DOCK_3.7_tart

Checking Your Protein Preparation

written by Reed Stein, 4/3/2019

Electrostatics

The electrostatics grid used in docking is called "trim.electrostatics.phi". This file contains the electrostatic potentials (in kT/e) in and around your protein structure, by solving the Poisson-Boltzmann equation using the program QNIFFT. The grid is trimmed to fit the DOCK box (called "box" in the working directory), which is overlaid onto your binding site. The input file for QNIFFT is called qnifft.parm, which reads in the "receptor.crg.lowdielectric.pdb" file, which contains your protein and low dielectric spheres, as well as your charge file "amb.crg.oxt" and radius file "vdw.siz". Your "receptor.crg.lowdielectric.pdb" file should have spheres that look like this:

  ATOM   9008  C   SPH  9008      87.491 136.887 124.980
  TER
  ATOM   9009  C   SPH  9009      87.900 138.214 123.837
  TER
  ATOM   9010  C   SPH  9010      88.222 138.764 124.234
  TER
  ATOM   9011  C   SPH  9011      88.080 138.630 124.390

For the QNIFFT calculation, the dielectric of the protein is set to 2, while the dielectric of anything outside the protein is 80, representing water. To check whether the atoms in your protein have been assigned the correct radii/charges after running QNIFFT, open the "qnifft.atm" output file. An example line from this file looks like this:

  ATOM      1  N   MET     1      84.419 139.350 124.664  1.65 -0.5200         N

where you have "ATOM", atom number, atom name, residue name, residue number, x coordinate, y coordinate, z coordinate, atomic radius, and atomic charge. The radius and charge values are taken from the "vdw.siz" and "amb.crg.oxt" files.

If you would like to manually run QNIFFT, run the following commands:

  $DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm
  $DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi

The first command will generate the electrostatic potentials of the full protein. The second command requires the "box" file to trim the "qnifft.electrostatics.phi" to only fit inside the binding site box. This new output will be called "trim.electrostatics.phi".

To visualize your low dielectric sphere setup, open "receptor.crg.lowdielectric.pdb" in Chimera. Select all "SPH" residues and display/represent as spheres. Change the van der Waals radii of these spheres to the van der Waals "SPH" radius found in the "vdw.siz" file in your working directory. This line in the "vdw.siz" file should look like this:

   c     sph   1.90

To do this on the command line in Chimera, run the following commands:

   sel #0:SPH
   display sel
   represent sphere sel
   vdwdefine 1.9 sel

The default radius is 1.90, but can be changed when scanning low dielectric sphere radii. To do this, change the radius in the "vdw.siz" file and then run QNIFFT again - see the tutorial on Parameter Scanning:

   http://wiki.docking.org/index.php/How_to_do_parameter_scanning

The actual electrostatic grids can be visualized by converting the "qnifft.electrostatics.phi" or "trim.electrostatics.phi" files into DX files for opening in Chimera. See the following tutorial for visualizing electrostatics grids:

   http://wiki.docking.org/index.php/Visualize_docking_grids

If you are happy with the way your spheres look, you can continue on to docking with them.

Ligand Desolvation

Heavy (radius = 1.8) and hydrogen (radius = 1.0) ligand desolvation grids are generated in your working directory in "heavy/" and "hydrogen/", respectively. The input file for these two separate calculations is "INSEV", which looks like this:

   rec.crg.lds.pdb  ### receptor input file
   ligand.desolv.heavy ### grid you want to generate
   1.60,1.65,1.90,1.90,1.90,1.00  ## radii for O, N, C, S, P, X (other atom type)
   1.4 ### probe radius
   2 ### grid resolution
   box ### box file - determines the extent of grids to be calculated
   1.8 ### Born radius of atom - 1.8 for heavy, 1.0 for hydrogen

By default, the "rec.crg.lds.pdb" file does not have any spheres, i.e. "ligand desolvation spheres". However, if you include ligand desolvation spheres, e.g. when parameter scanning, spheres can be included with the atom name as "X" (different from "C" as in the low dielectric spheres), as shown below:

   ATOM   9008  X   SPH  9008      87.491 136.887 124.980
   TER
   ATOM   9009  X   SPH  9009      87.900 138.214 123.837
   TER
   ATOM   9010  X   SPH  9010      88.222 138.764 124.234
   TER
   ATOM   9011  X   SPH  9011      88.080 138.630 124.390
   

The "X" radius in the "INSEV" file can be changed so that different ligand desolvation grids with different sphere radii can be generated. To do this, change the radius in the "INSEV" files for both hydrogen and heavy ligand desolvation grids, then run Solvmap for both. These spheres can be visualized (same as above with low dielectric spheres) by opening the "rec.crg.lds.pdb" file in Chimera, selecting all "SPH" residues, representing them as spheres and setting the vdW radius to the value that corresponds to "X" in the INSEV file.


Solvmap needs to be run twice to generate the heavy and hydrogen ligand desolvation grids. To run Solvmap, you need the "rec.crg.lds.pdb" and "box" files, and then run the command:

   $DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log


More advanced methods to alter your spheres, as well as increasing/decreasing charges on specific atoms can be found in these tutorials:

   http://wiki.docking.org/index.php/Using_thin_spheres_in_DOCK3.7
   http://wiki.docking.org/index.php/DOCK_3.7_tart


How Blastermaster works

written by Reed Stein 8/28/2019

This is a simplified explanation of what happens during blastermaster.

   DEFAULT PARAMETERS
   - No ligand desolvation spheres are used
   - Electrostatic low dielectric spheres fill the whole binding site and have radius of 1.9 Angstroms
   - All spheres within 2 Angstroms from xtal-lig.pdb are included in electrostatics calculation


   Parameters for tweaking blastermaster (from TEB's thin spheres wiki page):
    --mstsDensity=MSTSDENSITY
                      molecular surface denisty for thinspheres (default:
                      1.0)
    --useThinSphEleflag   if flag is given, use thinspheres in qnifft
                      calculation   (default: False)  Not tested with
                      multigrid code.
    --useThinSphLdsflag   if flag is given, use thinspheres in ligand
                      desolvation calculation   (default: False)  Not tested
                      with multigrid code.
    --ts_dist_ele=TS_DIST_ELE
                      for low dielectric thin spheres, distance to protein
                      surface (default: 1.0)
    --ts_radius_ele=TS_RADIUS_ELE
                      for low dielectric thin spheres, radius of spheres
                      (default: 1.0)
    --ts_dist_lds=TS_DIST_LDS
                      for ligand desolvation thin spheres, distance to
                      protein surface (default: 1.0)
    --ts_radius_lds=TS_RADIUS_LDS
                      for ligand desolvation thin spheres, radius of spheres
                      (default: 1.0)
    --ts_dist_to_lig=TS_DIST_TO_LIG
                      for both low dielectric thin spheres and ligand
                      desolovation, distance from ligand to keep spheres
                      (default: 2.0)



REDUCE is run on your rec.pdb to protonate the receptor:

   $DOCKBASE/proteins/Reduce/reduce -db reduce_wwPDB_het_dict.txt -HIS -FLIPs rec.pdb > rec.crg.pdb.fullh

Nonpolar hydrogens are removed (we are using a United Atom AMBER force field) and histidines are renamed according to protonation state. If your protein has disulfide bonds, these CYS residues should be renamed to CYX as well. rec.crg.pdb.fullh is renamed to:

   rec.crg.pdb

A list of binding site residues (based on your xtal-lig.pdb position) is determined using the filt program:

   $DOCKBASE/proteins/filt/bin/filt < $DOCKBASE/proteins/defaults/filt.params > filter.log

A file that lists the binding site residues is generated:

   rec.site.dms    

Using DMS, this file and a copy of rec.crg.pdb (rec.crg.pdb.dms) is used as an input for generating the molecular surface of the binding site:

   $DOCKBASE/proteins/dms/bin/dms rec.crg.pdb.dms -a -d 1.0 -i rec.site.dms -g dms.log -p -n -o rec.ms

If the thin spheres flag is on, then "rec.ts.ms" is needed:

   $DOCKBASE/proteins/dms/bin/dms rec.crg.pdb.dms -a -d 1.0 -i rec.site.dms -g dms.ts.log -p -n -o rec.ts.ms

Note that the "radii" file is used to determine the radii of atoms in your protein for the DMS calculation. If some of your atoms do not have radii in that file, they need to be manually added.

The molecular surface file, rec.ms, is used as an input for SPHGEN:

   $DOCKBASE/proteins/sphgen/bin/sphgen


SPHGEN generates a file called "all_spheres.sph", which contains clusters of spheres on the surface of the receptor.

The thin spheres molecular surface file, "rec.ts.ms", is also used to generate thin spheres (if this flag is used)

   $DOCKBASE/proteins/thinspheres/thin_spheres.py -i rec.ts.ms -o low_die_thinspheres.sph -d 1.000000 -s 1.000000

As above, the default distance from the surface and radius for thin spheres is 1.0 Angstrom

Both "electrostatic thin spheres" and "ligand desolvation spheres" are generated using these "low_die_thinspheres.sph". These spheres are trimmed by proximity to the xtal-lig.pdb (default distance to xtal-lig.pdb is 2 Angstroms):

   python $DOCKBASE/proteins/thinspheres/close_sph.py low_die_thinspheres.sph xtal-lig.pdb low_die_thinspheres.sph.close 2.000000 1.000000

To generate matching spheres, the xtal-lig.pdb is converted to spheres:

    $DOCKBASE/proteins/pdbtosph/bin/pdbtosph xtal-lig.pdb xtal-lig.match.sph

Low dielectric spheres (when the thin sphere flag is not used) are generated using "xtal-lig.match.sph" and the output of SPHGEN, "all_spheres.sph":

    $DOCKBASE/proteins/makespheres1/makespheres1.cli.pl xtal-lig.match.sph all_spheres.sph rec.crg.pdb lowdielectric.sph {MINIMUM NUMBER OF SPHERES TO GENERATE}

More matching spheres (in addition to those from the ligand, "xtal-lig.match.sph" are taken from the SPHGEN output, "all_spheres.sph":

    $DOCKBASE/proteins/makespheres3/makespheres3.cli.pl 1.5 0.8 45 xtal-lig.match.sph all_spheres.sph rec.crg.pdb matching_spheres.sph

This gives you your matching spheres file, "matching_spheres.sph", which should include 45 matching spheres.

The DOCK box is generated using the xtal-lig matching spheres and the receptor, as well as a distance cutoff:

    $DOCKBASE/proteins/makebox/makebox.smallokay.pl xtal-lig.match.sph rec.crg.pdb box 10.0

If the thin spheres flag is not used, then the input for the QNIFFT electrostatics calculation uses the "lowdielectric.sph" file:

    cat rec.crg.pdb lowdielectric.sph.pdb > receptor.crg.lowdielectric.pdb

However, if the thin spheres flag is used, then the input for QNIFFT uses the PDB-converted version of "low_die_thinspheres.sph.close" called "low_die_thinspheres.sph.close.pdb" file:

    cat rec.crg.pdb low_die_thinspheres.sph.close.pdb > receptor.crg.lowdielectric.pdb

QNIFFT is run on receptor.crg.lowdielectric.pdb:

    $DOCKBASE/DOCK/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm

The input of QNIFFT is:

    - qnifft.parm ### input parameter file
    - amb.crg.oxt ### charge file
    - vdw.size    ### radius file 
    - receptor.crg.lowdielectric.pdb ### receptor with spheres


The vdW program, CHEMGRID, is run using the input file, INCHEM:

   $DOCKBASE/proteins/chemgrid/bin/chemgrid

This calculation requires:

   - rec.crg.pdb ### protonated receptor (United Atom AMBER Force field, no nonpolar hydrogens)
   - prot.table.ambcrg.ambH ### mapping of each atom to its corresponding van der Waals parameters
   - vdw.parms.amb.mindock ### van der Waals parameters that correspond to atoms in prot.table.ambcrg.ambH
   - box ### box that overlays the binding site

Then SOLVMAP, which calculates ligand desolvation grids, is run twice for heavy (1.8 Angstrom radius) and hydrogen (1.0 Angstrom radius) grids:

   $DOCKBASE/proteins/solvmap/bin/solvmap

These calculations require:

  - rec.crg.lds.pdb
  - ligand.desolv.heavy OR ligand.desolv.hydrogen ### name of output file
  - box ### box that overlays the binding site

The "INSEV" file specifies the radius of the probe that is used to calculate the ligand desolvation grids.

A new directory called "dockfiles" is created, and grids and matching spheres are copied into it:

 copying matching_spheres.sph into dockfiles
 copying trim.electrostatics.phi into dockfiles
 copying ligand.desolv.hydrogen into dockfiles
 copying ligand.desolv.heavy into dockfiles
 copying vdw.bmp into dockfiles
 copying vdw.vdw into dockfiles
 copying vdw.parms.amb.mindock into dockfiles

Lastly, the INDOCK file is written


Useful References

DOCK

Kuntz, I. D.; Blaney, J. M.; Oatley, S. J.; Langridge, R.; Ferrin,T. E. A geometric approach to macromolecule-ligand interactions.J. Mol. Biol.1982,161, 269-288.

DOCK3.7

R. G. Coleman, M. Carchia, T. Sterling, J. J. Irwin, B. K. Shoichet, Ligand pose and orientational sampling in molecular docking. PLOS ONE 8, e75992 (2013)

CHEMGRID

Meng, E. C.; Shoichet, B. K.; Kuntz, I. D. Automated docking with grid-based energy evaluation. J. Comput. Chem. 1992, 13, 505-524.

REDUCE

J. M. Word, S. C. Lovell, J. S. Richardson, D. C. Richardson, Asparagine and glutamine: Using hydrogen atom contacts in the choice of side-chain amide orientation. J. Mol. Biol. 285, 1735–1747 (1999). doi:10.1006/jmbi.1998.2401

SOLVMAP

Shoichet, B. K.; Leach, A. R.; Kuntz, I. D. Ligand solvation in molecular docking. Proteins 1999, 34,4.

M. M. Mysinger, B. K. Shoichet, Rapid context-dependent ligand desolvation in molecular docking. J. Chem. Inf. Model.50, 1561–1573 (2010). doi:10.1021/ci100214a

QNIFFT/DelPhi

Gilson, M., Sharp, K. A., Honig, B. (1988). "Calculating the Electrostatic Potential of Molecules in Solution: Method and Error Assessment". J. Comp. Chem. 9:327-335

Sharp, K., Honig, B. (1990). "Electrostatic Interactions in Macromolecules: Theory and Applications". Ann. Rev. Biophys. Biophys. Chem 19:301-332.

Sitkoff et al, J. Phys. Chem, 1994.v 98, 1978-88 "Accurate Calculation of Hydration Free-Energies Using Macroscopic Solvent Models"

Sharp, K. A. (1995). "Polyelectrolyte electrostatics: Salt dependence, entropic and enthalpic contributions to free energy in the nonlinear Poisson-Boltzmann model". Biopolymers 36:227-243.