Dockopt (pydock3 script)

From DISI
Revision as of 21:48, 16 February 2023 by Frodo (talk | contribs) (asdf)
Jump to navigation Jump to search

dockopt allows the generation of many different docking configurations which are then evaluated & analyzed in parallel using a specified job scheduler (e.g. Slurm). If you are a Shoichet Lab user, please see a special section for you, below.

To use DOCK 3.8, you must first license it and install it. DOCK 3.8:How to install pydock3

init

Prepare rec.pdb, xtal-lig.pdb as described in Bender, 2021. https://pubmed.ncbi.nlm.nih.gov/34561691/ Or download pre-preared sample files from dudez2022.docking.org.

Be sure that you are in the directory containing the required input files:

  • rec.pdb
  • xtal-lig.pdb
  • actives.tgz
  • decoys.tgz

Note the inclusion of actives.tgz and decoys.tgz. Each of these is a tarball of a directory containing .db2 files. Therefore, you need to build the molecules yourself.

To create the file structure for your dockopt job, simply type

pydock3 dockopt - init

By default, the job directory is named dockopt_job. To specify a different name, use the "--job_dir_name" flag. E.g.:

pydock3 dockopt - init --job_dir_name=dockopt_job_2

The job directory contains two sub-directories:

  1. working: input files, intermediate blaster files, sub-directories for individual blastermaster subroutines
  2. retrodock_jobs: individual retrodock jobs for each docking configuration

The key difference between the working directories of blastermaster and dockopt is that the working directory of dockopt may contain multiple variants of the blaster files (suffixed by a number, e.g. "box_1"). These variant files are used to create the different docking configurations specified by the multi-valued entries of dockopt_config.yaml. They are created efficiently, such that the same variant used in multiple docking configurations is not created more than once.

If your current working directory contains any of the following files, then they will be automatically copied into the working directory within the created job directory. This feature is intended to simplify the process of configuring the dockopt job.

  • rec.pdb
  • xtal-lig.pdb
  • rec.crg.pdb
  • reduce_wwPDB_het_dict.txt
  • filt.params
  • radii
  • amb.crg.oxt
  • vdw.siz
  • delphi.def
  • vdw.parms.amb.mindock
  • prot.table.ambcrg.ambH

Only the following are required. Default versions / generated versions of the others will be used instead if they are not detected.

  • rec.pdb or rec.crg.pdb. Either is required, but not both. If both are present, rec.crg.pdb overrides.
  • xtal-lig.pdb

If you would like to use files not present in your current working directory, copy them into your job's working directory, e.g.:

cp <FILE_PATH> <JOB_DIR_NAME>/working/

Finally, configure the dockopt_config.yaml file in the job directory to your specifications. The parameters in this file govern the behavior of dockopt.

Note: The dockopt_config.yaml file differs from the blastermaster_config.yaml file in that every parameter of the former may accept either a single value or a list of comma-separated values, which indicates a pool of values to attempt for that parameter. Multiple such multi-valued parameters may be provided, and all unique resultant docking configurations will be attempted.

Single-valued YAML line format:

distance_to_surface: 1.0

Multi-valued YAML line format:

distance_to_surface: [1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9]

Environmental variables

TMPDIR

Designate where temporary job files should be placed. E.g.:

export TMPDIR=/scratch

Note for UCSF researchers

On the Wynton cluster, /scratch only exists on dev nodes (not log nodes). However, /wynton/scratch exists on both log nodes and dev nodes. Therefore, we recommend:

export TMPDIR=/wynton/scratch

job scheduler environmental variables

In order for dockopt to know which scheduler it should use, please configure the following environmental variables according to which one of the job schedulers you have.

Slurm

E.g., on the UCSF Shoichet Lab Gimel cluster (on any node other than 'gimel' itself, such as 'gimel5'):

export SBATCH_EXEC=/usr/bin/sbatch
export SQUEUE_EXEC=/usr/bin/squeue

SGE

On most clusters using SGE the following should be correct:

export QSTAT_EXEC=/opt/sge/bin/lx-amd64/qstat
export QSUB_EXEC=/opt/sge/bin/lx-amd64/qsub
export SGE_SETTINGS=/opt/sge/default/common/settings.sh
Note for UCSF researchers

The following is necessary on the UCSF Wynton cluster:

export QSTAT_EXEC=/opt/sge/bin/lx-amd64/qstat
export QSUB_EXEC=/opt/sge/bin/lx-amd64/qsub
export SGE_SETTINGS=/opt/sge/wynton/common/settings.sh

run

Once your job has been configured to your liking, navigate to the the job directory and run dockopt:

cd <JOB_DIR_NAME>
pydock3 dockopt - run <JOB_SCHEDULER_NAME> [--retrodock_job_timeout_minutes=None] [--retrodock_job_max_reattempts=0]

where <JOB_SCHEDULER_NAME> is one of:

  • sge
  • slurm

This will execute the many dockopt subroutines in sequence, except for the retrodock jobs run on each docking configuration, which are run in parallel via the scheduler. The state of the program will be printed to standard output as it runs.

Once the dockopt job is complete, the following files will be generated in the job directory:

  • report.pdf: contains (1) roc.png of best retrodock job, (2) box plots of enrichment for every multi-valued config parameter, and (3) heatmaps of enrichment for every pair of multi-valued config parameters
  • results.csv: enrichment metrics for each docking configuration

In addition, the best retrodock job will be copied to its own sub-directory best_retrodock_job/.

Within best_retrodock_job, there are the following files and sub-directories:

  • dockfiles/: parameters files and INDOCK for given docking configuration
  • output/: contains:
    • joblist
    • sub-directories 1/ for actives and 2/ for decoys (the former containing OUTDOCK and test.mol2 files, the latter containing just OUTDOCK)
    • log files for the retrodock jobs
  • roc.png: the ROC enrichment curve (log-scaled x-axis) for given docking configuration

Note: by default, a mol2 file is exported only for actives (output/1/), not for decoys (output/2/), in order to prevent disk space issues.

Note for UCSF Shoichet Lab members

pydock3 is already installed on the following clusters. You can source the provided Python environment scripts to expose the pydock3 executable:

Wynton

 source /wynton/group/bks/soft/python_envs/python3.8.5.sh

Gimel

Only nodes other than gimel itself are supported, e.g., gimel5.

ssh gimel5
source /nfs/soft/ian/python3.8.5.sh