Dock3.7

From DISI
Revision as of 15:28, 12 June 2017 by TBalius (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

See DOCK_3.7 for more up to date information.

DOCK3.7

DOCK3.7 is a new version of DOCK, with new accessory tools for protein & ligand preparation as well. The website for download will eventually be: http://dock.compbio.ucsf.edu/DOCK3.7/

The paper citation is Coleman PLOS ONE 2013

The citation for flexible docking with DOCK3.7 will be Fischer, Coleman, Fraser & Shoichet 2013, again this will be updated upon acceptance.

Ligand Preparation

Ligand preparation has been modified to use mol2db2 instead of mol2db for database generation. Many other features have also been integrated. To build a set of ligands from SMILES on the cluster, use:

db2start.e.csh input.smi ref

Or to build on a standalone machine, use

db2gen.e.csh input.smi ref

Note that many programs must be properly installed and available or this script will fail. The most troublesome is EPIK. For this reason, among others, Dahlia Weiss has helped get Marvin's Chemaxon cxcalc running in lieu of EPIK. This is probably the preferred way to build molecules. Run it on the cluster with:

db2start.e.cxcalc.csh input.smi

The format of the input file here is a two column file with one column being a SMILES string and the other column being an ID. Any length IDs are valid, but only 16 characters will get carried into the DOCKing phase of the operation, see Mol2db2_Format_2 for more details.

Once the jobs have finished, you can run

db2end-prefix.py name

To build dockable name-XXXXXX.db2.gz files.

Protein Target Preparation

be-blasti is still the preferred method of downloading PDB files and splitting them into rec.pdb and xtal-lig.pdb files. Run it with

be_blasti.csh filename

The filename should be a file that contains PDB codes you want to download.

Once you get a rec.pdb (representing the protein) and xtal-lig.pdb (representing the ligand or a set of atoms in the binding site of the protein) (these names can be changed as well, see options), you can run blastermaster.py, the new version of DOCK Blaster. Try running the help to see the extensive options:

$DOCK_BASE/src/blastermaster_1.0/blastermaster.py  -h

A typical way of running it is to just run it as:

$DOCK_BASE/src/blastermaster_1.0/blastermaster.py -v

-v gives you verbose output, which can be helpful if something goes wrong. If everything is successful, you'll see this at the end of the file:

copying matching_spheres.sph into dockfiles
copying trim.electrostatics.phi into dockfiles
copying ligand.desolv.hydrogen into dockfiles
copying ligand.desolv.heavy into dockfiles
copying vdw.bmp into dockfiles
copying vdw.vdw into dockfiles
copying vdw.parms.amb.mindock into dockfiles
	writing INDOCK file:  INDOCK

Otherwise something went wrong. Notice that you have an INDOCK file written for you, with many defaults set that you may want to change. Also, old INDOCK files are slightly incompatible with the new files, so you should consult the changed version written for you, or take a finer look at the page on the DOCK3.7 INDOCK file.

Running DOCK

Setting up an alias for the dock37tools directory in $DOCK_BASE/src/dock37tools/ is highly recommended, though you don't need it if you don't want it. Assume you have it set as $d37 from here on out with a command line this

setenv d37 $DOCK_BASE/src/dock37tools/

In that directory, there are 3 scripts for setting up a dock run. The first script will send setup a docking run where each file will be relegated to a separate node, fine for quick jobs.

$d37/setup_db2.csh /full/path/to/db2/files/

Another script, useful for testing on DUD-E targets, is:

$d37/setup_db2_own.csh /full/path/to/db2/files/

A file script, useful for running many db2.gz files, sometimes with a few files grouped together, like for prospective screening against lead-like is:

$d37/setup_db2_lots.py desiredDirectoryCount prefixName  /full/path/to/db2/files/

After setting up any of these runs, you can run them with the following:

$d37/submit.csh

or $d37/subdock.csh /path/to/dock.csh if you have compiled your own version of DOCK3.7.

Runs should proceed on the cluster. Problems will show up in the stderr files, further diagnosis can be attempted by looking at the various OUTDOCK files produced. If you've used lots of sampling, expect slower results. If you've asked for hundreds of poses, expect large files. You should not combine prospective screening, hundreds of poses and high sampling.

If a few jobs crash (shouldn't happen but anything is possible) and you need to complete them, run

$d37/restart.py -f

Analyzing DOCK results

The dock37tools in $d37 contain various analysis programs. Once your jobs are done, you can run:

$d37/extract_all.py

This may take awhile but it will pull all your results into a single file, etc. If you want to calculate enrichment, etc.:

$d37/enrich.py -l ligand-file -d decoy-file 

Where the ligand-file and decoy-file are single column files with the ligand and decoy IDs on individual lines. Plotting is also possible

$d37/plots.py -i . -l label --l=ligand-file -d decoy-file 

Common usage is to plot several different runs on a single plot like so:

$d37/plots.py -i run1 -l label1 -i run2 -l label2 --l=ligand-file -d decoy-file 

If you want to compare the scores from two runs, try:

$d37/two_run_plot.py run1 run2

Of course, the plots must be run on a machine with the proper libraries installed, like sgehead.

Another common use is to look at top poses in the ViewDock module of UCSF Chimera or with PyMOL. You can make a mol2 output file that can be read by these programs with the following command:

$d37/getposes.py

The defaults on this script are to make a poses.mol2 file with the top 500 poses from the entire run, with a single pose per molecule ID. There are many options which can be seen with the "-h" flag. A more complex example is:

$d37/getposes.py -z -l 1000 -x 2 -f ligands.txt -o ligands.1000.mol2

In order, the '-z' flag connects to ZINC for vendor information, the "-l 1000" flag only gets the first 1000 ligands in the file, '-x 2' gets the top 2 poses, the '-f ligands.txt' file designates the ligand file to use and '-o ligands.1000.mol2' designates the output filename.