Ucsfdock: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 34: Line 34:
'''Note:''' Invoking ''blastermaster'' commands below will produce a log file called ''blastermaster.log'' in your current working directory.
'''Note:''' Invoking ''blastermaster'' commands below will produce a log file called ''blastermaster.log'' in your current working directory.


=== ''blastermaster configure'' ===
=== ''configure'' ===


First you need to create the file structure for your blastermaster job. To do so, simply type
First you need to create the file structure for your blastermaster job. To do so, simply type


  blastermaster configure
  ucsfdock blastermaster - configure


By default, the job directory is named ''blastermaster_job''. To specify a different name, type
By default, the job directory is named ''blastermaster_job''. To specify a different name, type


  blastermaster configure <JOB_DIR_NAME>
  ucsfdock blastermaster - configure <JOB_DIR_NAME>


The job directory contains two sub-directories:  
The job directory contains two sub-directories:  
Line 71: Line 71:
Finally, configure the ''blastermaster_config.yaml'' file in the job directory to your specifications. The parameters in this file govern the behavior of blastermaster.
Finally, configure the ''blastermaster_config.yaml'' file in the job directory to your specifications. The parameters in this file govern the behavior of blastermaster.


=== ''blastermaster'' run ===
=== ''run'' ===


Once your job has been configured to your liking, navigate to the the job directory and run blastermaster:
Once your job has been configured to your liking, navigate to the the job directory and run blastermaster:
  cd <JOB_DIR_NAME>
  cd <JOB_DIR_NAME>
  blastermaster run
  ucsfdock blastermaster - run


This will execute the many blastermaster subroutines in sequence. The state of the program will be printed to standard output as it runs.
This will execute the many blastermaster subroutines in sequence. The state of the program will be printed to standard output as it runs.
Line 87: Line 87:
'''Note:''' Invoking ''dockmaster'' commands will produce a log file called ''dockmaster.log'' in your current working directory.
'''Note:''' Invoking ''dockmaster'' commands will produce a log file called ''dockmaster.log'' in your current working directory.


=== ''dockmaster configure'' ===
=== ''configure'' ===
First you need to create the file structure for your dockmaster job. To do so, simply type
First you need to create the file structure for your dockmaster job. To do so, simply type


  dockmaster configure
  ucsfdock dockmaster - configure


By default, the job directory is named ''dockmaster_job''. To specify a different name, type
By default, the job directory is named ''dockmaster_job''. To specify a different name, type


  dockmaster configure <JOB_DIR_NAME>
  ucsfdock dockmaster - configure <JOB_DIR_NAME>


The job directory contains two sub-directories:  
The job directory contains two sub-directories:  
Line 166: Line 166:
  export SGE_SETTINGS=/opt/sge/default/common/settings.sh
  export SGE_SETTINGS=/opt/sge/default/common/settings.sh


=== ''dockmaster run'' ===
=== ''run'' ===


Once your job has been configured to your liking, navigate to the the job directory and run dockmaster:
Once your job has been configured to your liking, navigate to the the job directory and run dockmaster:
  cd <JOB_DIR_NAME>
  cd <JOB_DIR_NAME>
  dockmaster run <JOB_SCHEDULER_NAME>
  ucsfdock dockmaster - run <JOB_SCHEDULER_NAME>


where <JOB_SCHEDULER_NAME> is one of:
where <JOB_SCHEDULER_NAME> is one of:
Line 179: Line 179:


You can also set the following flags to adjust retro docking job submission behavior. This example show the default values:
You can also set the following flags to adjust retro docking job submission behavior. This example show the default values:
  dockmaster run <JOB_SCHEDULER_NAME> --retro_docking_job_max_reattempts=0 --retro_docking_job_timeout_minutes=None
  ucsfdock dockmaster - run <JOB_SCHEDULER_NAME> --retro_docking_job_max_reattempts=0 --retro_docking_job_timeout_minutes=None


Once the dockmaster job is complete, the following files will be generated in the job directory:
Once the dockmaster job is complete, the following files will be generated in the job directory:

Revision as of 21:12, 12 August 2022

ucsfdock is a Python package wrapping the DOCK Fortran program that provides tools to help standardize and automate the computational methods employed in molecular docking.

Programs:

  • blastermaster: generate a specific docking configuration for a given receptor and ligand
  • dockmaster: generate many different docking configurations and then evaluate & analyze them in parallel using a specified job scheduler (e.g. Slurm)

A docking configuration is a unique set of DOCK parameter files (e.g., matching_spheres.sph) and INDOCK parameter values.

Installation

Coming soon.

Instructions

Note for UCSF Shoichet Lab members

ucsfdock is already installed on the following clusters. You can source the provided Python environment scripts to expose the relevant executables:

Wynton

source /wynton/home/irwin/isknight/envs/python3.8.5.sh

Gimel

Only nodes other than gimel itself are supported, e.g., gimel5.

ssh gimel5
source /nfs/soft/ian/python3.8.5.sh

blastermaster

blastermaster allows the generation of a specific docking configuration for a given receptor and ligand.

Note: Invoking blastermaster commands below will produce a log file called blastermaster.log in your current working directory.

configure

First you need to create the file structure for your blastermaster job. To do so, simply type

ucsfdock blastermaster - configure

By default, the job directory is named blastermaster_job. To specify a different name, type

ucsfdock blastermaster - configure <JOB_DIR_NAME>

The job directory contains two sub-directories:

  1. working: input files, intermediate blaster files, sub-directories for individual blastermaster subroutines
  2. dockfiles: output files (DOCK parameter files & INDOCK)

If your current working directory contains any of the following files, then they will be automatically copied into the working directory within the created job directory. This feature is intended to simplify the process of configuring the blastermaster job.

  • rec.pdb
  • xtal-lig.pdb
  • rec.crg.pdb
  • reduce_wwPDB_het_dict.txt
  • filt.params
  • radii
  • amb.crg.oxt
  • vdw.siz
  • delphi.def
  • vdw.parms.amb.mindock
  • prot.table.ambcrg.ambH

Only the following are required. Default versions / generated versions of the others will be used instead if they are not detected.

  • rec.pdb
  • xtal-lig.pdb

If you would like to use files not present in your current working directory, then copy them into your job's working directory, e.g.:

cp <FILE_PATH> <JOB_DIR_NAME>/working/

Finally, configure the blastermaster_config.yaml file in the job directory to your specifications. The parameters in this file govern the behavior of blastermaster.

run

Once your job has been configured to your liking, navigate to the the job directory and run blastermaster:

cd <JOB_DIR_NAME>
ucsfdock blastermaster - run

This will execute the many blastermaster subroutines in sequence. The state of the program will be printed to standard output as it runs.

dockmaster

dockmaster allows the generation of many different docking configurations which are then evaluated & analyzed in parallel using a specified job scheduler (e.g. Slurm).

The name "dockmaster", aside from being an uncreative rehash of the name "blastermaster", derives from the notion of a literal dockmaster, i.e., the person in charge of a dock who manages freight logistics and bosses around numerous dockworkers. In this analogy, a single dockworker corresponds to the processing of a single docking configuration.

Note: Invoking dockmaster commands will produce a log file called dockmaster.log in your current working directory.

configure

First you need to create the file structure for your dockmaster job. To do so, simply type

ucsfdock dockmaster - configure

By default, the job directory is named dockmaster_job. To specify a different name, type

ucsfdock dockmaster - configure <JOB_DIR_NAME>

The job directory contains two sub-directories:

  1. working: input files, intermediate blaster files, sub-directories for individual blastermaster subroutines
  2. retro_docking: individual retro docking jobs for each docking configuration

The key difference between the working directories of blastermaster and dockmaster is that the working directory of dockmaster may contain multiple variants of the blaster files (prefixed by a number, e.g. "1_box"). These variant files are used to create the different docking configurations specified by the multi-valued entries of dockmaster_config.yaml. They are created efficiently, such that the same variant used in multiple docking configurations is not created more than once.

If your current working directory contains any of the following files, then they will be automatically copied into the working directory within the created job directory. This feature is intended to simplify the process of configuring the dockmaster job.

  • rec.pdb
  • xtal-lig.pdb
  • rec.crg.pdb
  • reduce_wwPDB_het_dict.txt
  • filt.params
  • radii
  • amb.crg.oxt
  • vdw.siz
  • delphi.def
  • vdw.parms.amb.mindock
  • prot.table.ambcrg.ambH

Only the following are required. Default versions / generated versions of the others will be used instead if they are not detected.

  • rec.pdb
  • xtal-lig.pdb

If you would like to use files not present in your current working directory, copy them into your job's working directory, e.g.:

cp <FILE_PATH> <JOB_DIR_NAME>/working/

Finally, configure the dockmaster_config.yaml file in the job directory to your specifications. The parameters in this file govern the behavior of dockmaster.

Note: The dockmaster_config.yaml file differs from the blastermaster_config.yaml file in that every parameter of the former may accept either a single value or a list of comma-separated values, which indicates a pool of values to attempt for that parameter. Multiple such multi-valued parameters may be provided, and all unique resultant docking configurations will be attempted.

Single-valued YAML line format:

distance_to_surface: 1.0

Multi-valued YAML line format:

distance_to_surface: [1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9]

Environmental variables

Designate where the short cache and long cache should be located. E.g.:

export SHRTCACHE=/dev/shm  # temporary storage for job files
export LONGCACHE=/dev/shm  # long-term storage for files shared between jobs

In order for dockmaster to know which scheduler it should use, please configure the following environmental variables according to which one of the job schedulers you have.

Slurm

E.g., on the UCSF Shoichet Lab Gimel cluster (on any node other than 'gimel' itself, such as 'gimel5'):

export SBATCH_EXEC=/usr/bin/sbatch
export SQUEUE_EXEC=/usr/bin/squeue

SGE

E.g., on the UCSF Wynton cluster:

export QSTAT_EXEC=/opt/sge/bin/lx-amd64/qstat
export QSUB_EXEC=/opt/sge/bin/lx-amd64/qsub

The following is necessary on the UCSF Wynton cluster:

export SGE_SETTINGS=/opt/sge/wynton/common/settings.sh

On most clusters, this will probably be:

export SGE_SETTINGS=/opt/sge/default/common/settings.sh

run

Once your job has been configured to your liking, navigate to the the job directory and run dockmaster:

cd <JOB_DIR_NAME>
ucsfdock dockmaster - run <JOB_SCHEDULER_NAME>

where <JOB_SCHEDULER_NAME> is one of:

  • sge
  • slurm

This will execute the many dockmaster subroutines in sequence, except for the retro docking jobs run on each docking configuration, which are run in parallel via the scheduler. The state of the program will be printed to standard output as it runs.

You can also set the following flags to adjust retro docking job submission behavior. This example show the default values:

ucsfdock dockmaster - run <JOB_SCHEDULER_NAME> --retro_docking_job_max_reattempts=0 --retro_docking_job_timeout_minutes=None

Once the dockmaster job is complete, the following files will be generated in the job directory:

  • dockmaster_job_report.pdf: contains (1) roc.png of best retro docking job, (2) box plots of enrichment for every multi-valued config parameter, and (3) heatmaps of enrichment for every pair of multi-valued config parameters
  • dockmaster_job_results.csv: enrichment metrics for each docking configuration

In addition, the best retro docking job will be copied to its own sub-directory best_retro_docking_job/.

Within each retro docking job directory, there are the following files and sub-directories:

  • working/: intermediate files
  • dockfiles/: parameters files and INDOCK for given docking configuration
  • output/: contains:
    • joblist
    • sub-directories 1/ for actives and 2/ for decoys (each containing OUTDOCK and test.mol2 files)
    • log files for the retro docking jobs
  • retro_docking_job_results.csv: data loaded from OUTDOCK files for both actives and decoys
  • roc.png: the ROC enrichment curve (log-scaled x-axis) for given docking configuration