SUBDOCK DOCK3.8: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
Line 123: Line 123:
== SUBDOCK help splash - all argument descriptions & defaults ==
== SUBDOCK help splash - all argument descriptions & defaults ==
  <nowiki>
  <nowiki>
[user@machine SUBDOCK]$ ./subdock.bash --help
SUBDOCK! Run docking workloads via job controller of your choice
SUBDOCK! Run docking workloads via job controller of your choice
=================required arguments=================
=================required arguments=================
expected env arg: EXPORT_DEST, --export-dest
EXPORT_DEST, --export-dest
arg description: nfs output destination for OUTDOCK and test.mol2.gz files
arg description: nfs output destination for OUTDOCK and test.mol2.gz files


expected env arg: INPUT_SOURCE, --input-source
INPUT_SOURCE, --input-source
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files
arg description: nfs directory containing one or more .db2* files OR a file containing a list of db2* files


expected env arg: DOCKFILES, --dockfiles
DOCKFILES, --dockfiles
arg description: nfs directory containing dock related files and INDOCK configuration for docking run
arg description: nfs directory containing dock related files and INDOCK configuration for docking run


expected env arg: DOCKEXEC, --dockexec
DOCKEXEC, --dockexec
arg description: nfs path to dock executable
arg description: nfs path to dock executable


=================job controller settings=================
=================job controller settings=================
optional env arg missing: USE_SLURM, --use-slurm
USE_SLURM, --use-slurm
arg description: use slurm
arg description: use slurm
defaulting to true


optional env arg missing: USE_SLURM_ARGS, --use-slurm-args
USE_SLURM_ARGS, --use-slurm-args
arg description: addtl arguments for SLURM sbatch command
arg description: addtl arguments for SLURM sbatch command
defaulting to


optional env arg missing: USE_SGE, --use-sge
USE_SGE, --use-sge
arg description: use sge
arg description: use sge
defaulting to false


optional env arg missing: USE_SGE_ARGS, --use-sge-args
USE_SGE_ARGS, --use-sge-args
arg description: addtl arguments for SGE qsub command
arg description: addtl arguments for SGE qsub command
defaulting to


optional env arg missing: USE_PARALLEL, --use-parallel
USE_PARALLEL, --use-parallel
arg description: use GNU parallel
arg description: use GNU parallel
defaulting to false


=================input settings=================
=================input settings=================
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz
USE_DB2_TGZ, --use-db2-tgz
arg description: dock db2.tgz tar files
arg description: dock db2.tgz tar files
defaulting to true


optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size
USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size
arg description: how many db2.tgz to evaluate per batch
arg description: how many db2.tgz to evaluate per batch
defaulting to 1


optional env arg missing: USE_DB2, --use-db2
USE_DB2, --use-db2
arg description: dock db2.gz individual files
arg description: dock db2.gz individual files
defaulting to false


optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size
USE_DB2_BATCH_SIZE, --use-db2-batch-size
arg description: how many db2.gz to evaluate per batch
arg description: how many db2.gz to evaluate per batch
defaulting to 100


=================addtl job configuration=================
=================addtl job configuration=================
optional env arg missing: MAX_PARALLEL, --max-parallel
MAX_PARALLEL, --max-parallel
arg description: max jobs allowed to run in parallel
arg description: max jobs allowed to run in parallel
defaulting to -1


optional env arg missing: SHRTCACHE, --shrtcache
SHRTCACHE, --shrtcache
arg description: temporary local storage for job files
arg description: temporary local storage for job files
defaulting to /scratch


optional env arg missing: LONGCACHE, --longcache
LONGCACHE, --longcache
arg description: longer term storage for files shared between jobs
arg description: longer term storage for files shared between jobs
defaulting to /scratch
 
=================miscellaneous=================
SUBMIT_WAIT_TIME, --submit-wait-time
arg description: how many seconds to wait before submitting
</nowiki>
</nowiki>


[[Category:DOCK_3.8]]
[[Category:DOCK_3.8]]

Revision as of 20:57, 7 March 2023

Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.

Installing

git clone https://github.com/docking-org/SUBDOCK.git

IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!

subdock.bash is located @ subdock.bash relative to the repository root.

subdock.bash can be called directly from any location- it is not sensitive to the current working directory.

What's New?

For those of you that have used a subdock utility before, here's what is new in this release:

1. All jobs platforms (e.g slurm, sge) are supported on the same script

2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/

3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.

4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"

5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.

Supported Platforms

There are three platforms currently supported:

1. SLURM

2. SGE (Sun Grid Engine)

3. GNU Parallel (for local runs- ideal for testing)

One of these platforms must be specified- SLURM is the default. These platforms can be set by the

--use-slurm=true
--use-sge=true
--use-parallel=true

Arguments, respectively

Supported File Types

DOCK can be run on individual db2.gz files or db2.tgz tar packages.

The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default

Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.

The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.

Full Example - All Steps

This example assumes you have access to a DOCK executable, but nothing else.

1. Source subdock code from github

git clone https://github.com/docking-org/SUBDOCK.git

2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.

# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/

3a. Get db2 database subset sample via ZINC-22. Example provided below:

wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz

You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.

3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:

find $PWD -type f -name '*.db2.tgz' > sdi.in

4. Export the parameters we just prepared as environment variables. You need a DOCK executable! This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!

export INPUT_SOURCE=$PWD/sdi.in
export EXPORT_DEST=$PWD/output
export DOCKFILES=$PWD/dockfiles
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64

5. Choose a platform. You must select only one platform - mixing and matching is not supported.

export USE_SLURM=true|...
export USE_SGE=true|...
export USE_PARALLEL=true|...

Any value other than exactly "true" will be interpreted as false.

6a. Run docking!

bash ~/SUBDOCK/subdock.bash

6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.

export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true

7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.

Error Messages in my OUTDOCK!

If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:

       1      2 bonds with error
Error. newlist is not big enough

If these messages bother you use the dock38_nogist executable described in How_to_install_DOCK_3.8#Prebuilt_Executable

This version voids the code related to the GIST scoring function, which is responsible for these errors.

Note on Backwards Compatibility With DOCK 3.7

Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.

DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.

SUBDOCK help splash - all argument descriptions & defaults

[user@machine SUBDOCK]$ ./subdock.bash --help
SUBDOCK! Run docking workloads via job controller of your choice
=================required arguments=================
EXPORT_DEST, --export-dest
arg description: nfs output destination for OUTDOCK and test.mol2.gz files

INPUT_SOURCE, --input-source
arg description: nfs directory containing one or more .db2* files OR a file containing a list of db2* files

DOCKFILES, --dockfiles
arg description: nfs directory containing dock related files and INDOCK configuration for docking run

DOCKEXEC, --dockexec
arg description: nfs path to dock executable

=================job controller settings=================
USE_SLURM, --use-slurm
arg description: use slurm

USE_SLURM_ARGS, --use-slurm-args
arg description: addtl arguments for SLURM sbatch command

USE_SGE, --use-sge
arg description: use sge

USE_SGE_ARGS, --use-sge-args
arg description: addtl arguments for SGE qsub command

USE_PARALLEL, --use-parallel
arg description: use GNU parallel

=================input settings=================
USE_DB2_TGZ, --use-db2-tgz
arg description: dock db2.tgz tar files

USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size
arg description: how many db2.tgz to evaluate per batch

USE_DB2, --use-db2
arg description: dock db2.gz individual files

USE_DB2_BATCH_SIZE, --use-db2-batch-size
arg description: how many db2.gz to evaluate per batch

=================addtl job configuration=================
MAX_PARALLEL, --max-parallel
arg description: max jobs allowed to run in parallel

SHRTCACHE, --shrtcache
arg description: temporary local storage for job files

LONGCACHE, --longcache
arg description: longer term storage for files shared between jobs

=================miscellaneous=================
SUBMIT_WAIT_TIME, --submit-wait-time
arg description: how many seconds to wait before submitting