SUBDOCK DOCK3.8
Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.
installing
git clone https://github.com/docking-org/DOCK.git
subdock.bash is located @ ucsfdock/docking/submit/subdock.bash relative to the repository root.
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.
what's new?
For those of you that have used a subdock utility before, here's what is new in this release:
1. All jobs platforms (e.g slurm, sge) are supported on the same script
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.
supported platforms
There are three platforms currently supported:
1. SLURM
2. SGE (Sun Grid Engine)
3. GNU Parallel (for local runs- ideal for testing)
One of these platforms must be specified- SLURM is the default. These platforms can be set by the
--use-slurm=true --use-sge=true --use-parallel=true
Arguments, respectively
supported file types
DOCK can be run on individual db2.gz files or db2.tgz tar packages.
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.
full example - all steps
1. Source subdock code from github
git clone https://github.com/docking-org/DOCK.git
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.
# note- you may need to change the INDOCK header to say "DOCK 3.8 parameter" instead of "DOCK 3.7 parameter" if you are using DOCK 3.8 wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/
3a. Get db2 database subset sample via ZINC-22. Example provided below:
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:
find $PWD -type -f -name '*.db2.tgz' > sdi.in
4. Export the parameters we just prepared as environment variables
export INPUT_SOURCE=$PWD/sdi.in export EXPORT_DEST=$PWD/output export DOCKFILES=$PWD/dockfiles export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64
5. Choose a platform. You must select only one platform - mixing and matching is not supported.
export USE_SLURM=true|... export USE_SGE=true|... export USE_PARALLEL=true|...
Any value other than exactly "true" will be interpreted as false.
6a. Run docking!
bash ~/DOCK/ucsfdock/docking/submit/subdock.bash
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64 bash ~/DOCK/ucsfdock/docking/submit/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.
subdock help splash - all argument descriptions & defaults
SUBDOCK! Run docking workloads via job controller of your choice =================required arguments================= expected env arg: EXPORT_DEST, --export-dest arg description: nfs output destination for OUTDOCK and test.mol2.gz files expected env arg: INPUT_SOURCE, --input-source arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files expected env arg: DOCKFILES, --dockfiles arg description: nfs directory containing dock related files and INDOCK configuration for docking run expected env arg: DOCKEXEC, --dockexec arg description: nfs path to dock executable =================job controller settings================= optional env arg missing: USE_SLURM, --use-slurm arg description: use slurm defaulting to true optional env arg missing: USE_SLURM_ARGS, --use-slurm-args arg description: addtl arguments for SLURM sbatch command defaulting to optional env arg missing: USE_SGE, --use-sge arg description: use sge defaulting to false optional env arg missing: USE_SGE_ARGS, --use-sge-args arg description: addtl arguments for SGE qsub command defaulting to optional env arg missing: USE_PARALLEL, --use-parallel arg description: use GNU parallel defaulting to false =================input settings================= optional env arg missing: USE_DB2_TGZ, --use-db2-tgz arg description: dock db2.tgz tar files defaulting to true optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size arg description: how many db2.tgz to evaluate per batch defaulting to 1 optional env arg missing: USE_DB2, --use-db2 arg description: dock db2.gz individual files defaulting to false optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size arg description: how many db2.gz to evaluate per batch defaulting to 100 =================addtl job configuration================= optional env arg missing: MAX_PARALLEL, --max-parallel arg description: max jobs allowed to run in parallel defaulting to -1 optional env arg missing: SHRTCACHE, --shrtcache arg description: temporary local storage for job files defaulting to /scratch optional env arg missing: LONGCACHE, --longcache arg description: longer term storage for files shared between jobs defaulting to /scratch