How to dock in DOCK3.8
How to dock in DOCK 3.8.0
Differences from DOCK.3.7
DOCK 3.8.0 can be interrupted safely and restarted, which allows more flexibility when submitting docking jobs.
For example, you could set QSUB_ARGS="-l s_rt=00:05:00 -l h_rt=00:07:00" (or SBATCH_ARGS="--time=00:07:00") so that each docking job will only run for 5 minutes before being interrupted. The new subdock.bash script allows submitting the same set of jobs multiple times, until they are all complete. A more pragmatic choice might be "-l s_rt=00:28:00 -l h_rt=00:30:00" to get the benefit of faster scheduling on wynton in the short.q. Another advantage is that the job can be interrupted at any time on AWS and it will checkpoint and be restartable.
Running the Script
New subdock scripts are here:
$DOCKBASE/docking/submit/sge/subdock.bash $DOCKBASE/docking/submit/slurm/subdock.bash
subdock.bash requires a number of environmental variables to be passed as arguments.
Required Arguments
INPUT_SOURCE
INPUT_SOURCE should be either:
a) A directory containing one or more db2.tgz files OR
b) A text file containing a list of paths to db2.tgz files
A db2.tgz file should be a tarred + gzipped archive (tar -czf archive.tgz) that contains one or more db2 or db2.gz files.
A job will be launched for each db2.tgz file in INPUT_SOURCE.
EXPORT_DEST
A directory on the NFS where you would like your docking output to end up. If the directory does not exist, the script will try to create it.
DOCKEXEC
An NFS path to a DOCK binary executable (NOT a wrapper script).
IMPORTANT: You should append the executable's compile time stamp to the end of it's name, e.g dock64.20210302. This will avoid any confusion of this executable with other versions of DOCK floating around.
DOCKFILES
An NFS path to the dockfiles (INDOCK, spheres, receptor files, grids, etc.) being used for this docking run. The dockfiles directory should be named uniquely, to avoid confusion with other dockfiles other users may be running.
Optional Arguments
SHRTCACHE
The directory DOCK will perform it's work in. Files saved to this directory will be deleted once the docking job has concluded. By default this is /dev/shm.
LONGCACHE
The directory DOCK will store files that are shared between multiple docking jobs. Files saved to this directory (dockexec and dockfiles) will persist until they are deleted. By default this directory is /tmp.
Beware of using the default SHRTCACHE or LONGCACHE settings on large clusters.
SBATCH_ARGS
Additional arguments to provide to slurm's sbatch, if using the slurm version of subdock.bash.
QSUB_ARGS
Additional arguments to provide to sge's qsub, if using the sge version of subdock.bash
Examples
BKS Example
export INPUT_SOURCE=example.in export OUTPUT_DEST=output export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64 export DOCKFILES=dockfiles.example export SHRTCACHE=/dev/shm export LONGCACHE=/tmp export SBATCH_ARGS="--time=02:00:00" $DOCKBASE/docking/submit/slurm/subdock.bash
Wynton Example
export INPUT_SOURCE=example.in export OUTPUT_DEST=output export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64 export DOCKFILES=dockfiles.example export SHRTCACHE=/scratch export LONGCACHE=/scratch export QSUB_ARGS="-l s_rt=00:28:00 -l h_rt=00:30:00" $DOCKBASE/docking/submit/sge/subdock.bash
Example: Running a lot of docking jobs
- see ZINC22:Current status for more info about where ZINC can be found.
- 1. set up sdi files
mkdir sdi export sdi=sdi ls /wynton/group/bks/zinc-22/H19/H19P0??/*.db2.tgz > $sdi/h19p0.in ls /wynton/group/bks/zinc-22/H19/H19P1??/*.db2.tgz > $sdi/h19p1.in ls /wynton/group/bks/zinc-22/H19/H19P2??/*.db2.tgz > $sdi/h19p2.in ls /wynton/group/bks/zinc-22/H19/H19P3??/*.db2.tgz > $sdi/h19p3.in and so on
- 2. set up INDOCK and dockfiles. rename dockfiles to dockfiles.$indockhash. On some nodes, the shasum command is called by sha1sum. Ultimately, renaming the dockfiles to a unique dockfiles is key.
bash indockhash=$(cat INDOCK | shasum | awk '{print substr($1, 1, 12)}')
- 3. super script:
export DOCKBASE=/wynton/group/bks/work/jji/DOCK export DOCKFILES=$WORKDIR/dockfiles.21751f1bb16b export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64 #export SHRTCACHE=/dev/shm # default export SHRTCACHE=/scratch export LONGCACHE=/scratch export QSUB_ARGS="-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G" for i in sdi/*.in ; do export k=$(basename $i .in) echo k $k export INPUT_SOURCE=$PWD/$i export EXPORT_DEST=$PWD/output/$k $DOCKBASE/docking/submit/sge/subdock.bash done
- 3a. to run for first time
sh super
- 4. how to restart (to make sure complete, iterate until complete)
sh super
- 5. check which output is valid (and broken or incomplete output)
- 6. extract all blazing fast
- 7. extract mol2
more soon, under active development, Jan 28.
Appendix: Docking mono-cations of ZINC22 with DOCK3.8 on Wynton
Added by Ying 3/10/2021
- set up the folder to run docking
mkdir zinc22_3d_build_3-10-2021 cd zinc22_3d_build_3-10-2021
- copy INDOCK into dockfiles folder, and transfer to the created folder
cp INDOCK dockfiles scp -r dockfiles dt2.wynton.ucsf.edu:/path_to_created_folder
- get sdi of monocations of already built ZINC22 (<= H26 heavy atom count)
mkdir sdi foreach i (`seq 4 1 26`) set hac = `printf "H%02d" $i ` echo $i $hac touch sdi/${hac}.sdi foreach tgz (`ls /wynton/group/bks/zinc-22*/${hac}/${hac}[PM]???/*-O*.db2.tgz`) ls $tgz echo $tgz >> sdi/${hac}.sdi end end
- rename the dockfiles directory
indockhash=$(cat INDOCK | sha1sum | awk '{print substr($1, 1, 12)}') mv dockfiles dockfiles.${indockhash}
- write and run the super_run.sh
cat <<EOF > super_run.sh export DOCKBASE=/wynton/group/bks/soft/DOCK-3.8.0.1 export DOCKEXEC=$DOCKBASE/docking/DOCK/bin/dock64 # CHANGE here: path to the dockfiles.${indockhash} export DOCKFILES=/wynton/group/bks/work/yingyang/5HT-5a/10_AL-dock/zinc22_3d_build_3-10-2021/dockfiles.${indockhash} export SHRTCACHE=/scratch export LONGCACHE=/scratch export QSUB_ARGS="-l s_rt=00:28:00 -l h_rt=00:30:00 -l mem_free=2G" for i in sdi/*.sdi ; do export k=$(basename $i .sdi) echo k $k export INPUT_SOURCE=$PWD/$i export EXPORT_DEST=$PWD/output/$k $DOCKBASE/docking/submit/sge/subdock.bash done EOF bash super_run.sh
- extract the output
ls -d output/*/*/ > dirlist python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt 0
- get poses.mol2
/wynton/home/shoichetlab/yingyang/programs/miniconda3/envs/opencadd/bin/python \ /wynton/home/shoichetlab/yingyang/scripts/get_poses.py -z test.mol2.gz.0 -n 1000 -p poses_top1k.mol2