SUBDOCK DOCK3.8: Difference between revisions
No edit summary |
|||
Line 8: | Line 8: | ||
subdock.bash can be called directly from any location- it is not sensitive to the current working directory. | subdock.bash can be called directly from any location- it is not sensitive to the current working directory. | ||
== | == what's new? == | ||
For those of you that have used a subdock utility before, here's what is new in this release: | |||
1. All platforms are supported on the same script. | |||
2. Subdock can now be run on both db2.gz files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility. | |||
3. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value" | |||
4. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission. | |||
4. | |||
== supported platforms == | == supported platforms == |
Revision as of 22:20, 1 December 2022
installing
git clone https://github.com/docking-org/DOCK.git
subdock.bash is located @ ucsfdock/docking/submit/subdock.bash relative to the repository root.
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.
what's new?
For those of you that have used a subdock utility before, here's what is new in this release:
1. All platforms are supported on the same script.
2. Subdock can now be run on both db2.gz files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.
3. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"
4. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.
supported platforms
There are three platforms currently supported:
1. SLURM
2. SGE (Sun Grid Engine)
3. GNU Parallel (for local runs- ideal for testing)
One of these platforms must be specified- SLURM is the default. These platforms can be set by the
--use-slurm=true --use-sge=true --use-parallel=true
Arguments, respectively
supported file types
DOCK can be run on individual db2.gz files or db2.tgz tar packages.
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.
full example - all steps
1. Source subdock code from github
git clone https://github.com/docking-org/DOCK.git
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/
3a. Get db2 database subset sample via ZINC-22. Example provided below:
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:
find $PWD -type -f -name '*.db2.tgz' > sdi.in
4. Export the parameters we just prepared as environment variables
export INPUT_SOURCE=$PWD/sdi.in export EXPORT_DEST=$PWD/output export DOCKFILES=$PWD/dockfiles export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64
5. Choose a platform. You must select only one platform - mixing and matching is not supported.
export USE_SLURM=true|... export USE_SGE=true|... export USE_PARALLEL=true|...
Any value other than exactly "true" will be interpreted as false.
6a. Run docking!
bash ~/DOCK/ucsfdock/docking/submit/subdock.bash
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64 bash ~/DOCK/ucsfdock/docking/submit/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.
subdock help splash - all argument descriptions & defaults
SUBDOCK! Run docking workloads via job controller of your choice =================required arguments================= expected env arg: EXPORT_DEST, --export-dest arg description: nfs output destination for OUTDOCK and test.mol2.gz files expected env arg: INPUT_SOURCE, --input-source arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files expected env arg: DOCKFILES, --dockfiles arg description: nfs directory containing dock related files and INDOCK configuration for docking run expected env arg: DOCKEXEC, --dockexec arg description: nfs path to dock executable =================job controller settings================= optional env arg missing: USE_SLURM, --use-slurm arg description: use slurm defaulting to true optional env arg missing: USE_SLURM_ARGS, --use-slurm-args arg description: addtl arguments for SLURM sbatch command defaulting to optional env arg missing: USE_SGE, --use-sge arg description: use sge defaulting to false optional env arg missing: USE_SGE_ARGS, --use-sge-args arg description: addtl arguments for SGE qsub command defaulting to optional env arg missing: USE_PARALLEL, --use-parallel arg description: use GNU parallel defaulting to false =================input settings================= optional env arg missing: USE_DB2_TGZ, --use-db2-tgz arg description: dock db2.tgz tar files defaulting to true optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size arg description: how many db2.tgz to evaluate per batch defaulting to 1 optional env arg missing: USE_DB2, --use-db2 arg description: dock db2.gz individual files defaulting to false optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size arg description: how many db2.gz to evaluate per batch defaulting to 100 =================addtl job configuration================= optional env arg missing: MAX_PARALLEL, --max-parallel arg description: max jobs allowed to run in parallel defaulting to -1 optional env arg missing: SHRTCACHE, --shrtcache arg description: temporary local storage for job files defaulting to /scratch optional env arg missing: LONGCACHE, --longcache arg description: longer term storage for files shared between jobs defaulting to /scratch