SUBDOCK DOCK3.8: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 61: Line 61:


The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.
== minimal examples ==
Using environmental arguments:
<nowiki>
export INPUT_SOURCE=$PWD/sdi.txt
export EXPORT_DEST=$PWD/output
export DOCKFILES=$PWD/dockfiles
export DOCKEXEC=$PWD/dock64
export USE_SLURM=true
bash subdock.bash</nowiki>
Using command line arguments:
<nowiki>
bash subdock.bash --input-source=$PWD/sdi.txt --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --dockexec=$PWD/dock64 --use-slurm=true</nowiki>


== full example - all steps ==
== full example - all steps ==

Revision as of 21:48, 1 December 2022

installing

git clone https://github.com/docking-org/DOCK.git

subdock.bash is located @ ucsfdock/docking/submit/subdock.bash relative to the repository root.

subdock.bash can be called directly from any location- it is not sensitive to the current working directory.

how to use subdock

Subdock is a utility for running UCSF docking workloads across multiple platforms.

There are four essential ingredients that go in to every docking run-

1. ligand files (db2, set by INPUT_SOURCE)

2. receptor files (dockfiles, set by DOCKFILES)

3. dock executable (set by DOCKEXEC)

4. output destination (set by EXPORT_DEST)

Traditionally, arguments are specified to SUBDOCK via environment variables, e.g:

export INPUT_SOURCE=/some/path
export EXPORT_DEST=/some/path/2
...
bash subdock.bash

In the latest version of subdock, these arguments can be specified on the command line if desired, e.g:

bash subdock.bash --input-source=/some/path --export-dest=/some/path/2

Command line arguments will override environmental arguments. The name of a command line argument can be obtained by converting the environmental argument name into lowercase, switching '_' out with '-' & vice versa.

supported platforms

There are three platforms currently supported:

1. SLURM

2. SGE (Sun Grid Engine)

3. GNU Parallel (for local runs- ideal for testing)

One of these platforms must be specified- SLURM is the default. These platforms can be set by the

--use-slurm=true
--use-sge=true
--use-parallel=true

Arguments, respectively

supported file types

DOCK can be run on individual db2.gz files or db2.tgz tar packages.

The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default

Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.

The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.

full example - all steps

1. Source subdock code from github

git clone https://github.com/docking-org/DOCK.git

2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.

wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/

3a. Get db2 database subset sample via ZINC-22. Example provided below:

wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz

You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.

3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:

find $PWD -type -f -name '*.db2.tgz' > sdi.in

4. Export the parameters we just prepared as environment variables

export INPUT_SOURCE=$PWD/sdi.in
export EXPORT_DEST=$PWD/output
export DOCKFILES=$PWD/dockfiles
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64

5. Choose a platform. You must select only one platform - mixing and matching is not supported.

export USE_SLURM=true|false
export USE_SGE=true|false
export USE_PARALLEL=true|false

6a. Run docking!

bash ~/DOCK/ucsfdock/docking/submit/subdock.bash

6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.

export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64
bash ~/DOCK/ucsfdock/docking/submit/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true

7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.

subdock help splash - all argument descriptions & defaults

SUBDOCK! Run docking workloads via job controller of your choice
=================required arguments=================
expected env arg: EXPORT_DEST, --export-dest
arg description: nfs output destination for OUTDOCK and test.mol2.gz files

expected env arg: INPUT_SOURCE, --input-source
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files

expected env arg: DOCKFILES, --dockfiles
arg description: nfs directory containing dock related files and INDOCK configuration for docking run

expected env arg: DOCKEXEC, --dockexec
arg description: nfs path to dock executable

=================job controller settings=================
optional env arg missing: USE_SLURM, --use-slurm
arg description: use slurm
defaulting to true

optional env arg missing: USE_SLURM_ARGS, --use-slurm-args
arg description: addtl arguments for SLURM sbatch command
defaulting to

optional env arg missing: USE_SGE, --use-sge
arg description: use sge
defaulting to false

optional env arg missing: USE_SGE_ARGS, --use-sge-args
arg description: addtl arguments for SGE qsub command
defaulting to

optional env arg missing: USE_PARALLEL, --use-parallel
arg description: use GNU parallel
defaulting to false

=================input settings=================
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz
arg description: dock db2.tgz tar files
defaulting to true

optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size
arg description: how many db2.tgz to evaluate per batch
defaulting to 1

optional env arg missing: USE_DB2, --use-db2
arg description: dock db2.gz individual files
defaulting to false

optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size
arg description: how many db2.gz to evaluate per batch
defaulting to 100

=================addtl job configuration=================
optional env arg missing: MAX_PARALLEL, --max-parallel
arg description: max jobs allowed to run in parallel
defaulting to -1

optional env arg missing: SHRTCACHE, --shrtcache
arg description: temporary local storage for job files
defaulting to /scratch

optional env arg missing: LONGCACHE, --longcache
arg description: longer term storage for files shared between jobs
defaulting to /scratch