DOCK3.8:Pydock3: Difference between revisions

From DISI
Jump to navigation Jump to search
(Created page with "''pydock3'' is a Python package wrapping the DOCK Fortran program that provides tools to help standardize and automate the computational methods employed in molecular...")
 
No edit summary
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
''pydock3'' is a Python package wrapping the [[DOCK|DOCK Fortran program]] that provides tools to help standardize and automate the computational methods employed in molecular docking.
''pydock3'' is a Python package wrapping the [[DOCK|DOCK Fortran program]] that provides tools to help standardize and automate the computational methods employed in molecular docking. It is a natural successor to DOCK Blaster, originally published in 2009, and blastermaster.py, part of the [[DOCK 3.7]] release in 2012.
 
[[File:Pydock3 logo.png|thumb]]


Scripts included in ''pydock3'':
Scripts included in ''pydock3'':
* ''blastermaster'': generate a specific docking configuration for a given receptor and ligand
* ''dockopt'': generate many different docking configurations, perform retrospective docking on them in parallel using a specified job scheduler (e.g. Slurm), and analyze the results.  
* ''dockopt'': generate many different docking configurations, perform retrospective docking on them in parallel using a specified job scheduler (e.g. Slurm), and analyze the results.  
* ''blastermaster'': generate a specific docking configuration for a given receptor and ligand, intended for use by experts who wish to tune the docking model themselves.  This is a direct successor of blastermaster.py from DOCK 3.7.


A '''docking configuration''' is a unique set of DOCK parameter files (e.g., ''matching_spheres.sph'') and INDOCK parameter values.
A [[docking configuration|'''docking configuration''']] is a unique set of (1) DOCK parameter files (e.g., ''matching_spheres.sph''), (2) an INDOCK file, and (3) a DOCK executable.


= Installation =
= Installation =


[[How to install pydock3]]
See: [[DOCK 3.8:How to install pydock3]].
 


= Instructions =
= Instructions =


== Note for UCSF Shoichet Lab members ==
''pydock3'' is already installed on the following clusters. You can source the provided Python environment scripts to expose the relevant executables:
=== Wynton ===
source /wynton/home/irwin/isknight/envs/python3.8.5.sh
=== Gimel ===
Only nodes other than ''gimel'' itself are supported, e.g., ''gimel5''.
ssh gimel5
source /nfs/soft/ian/python3.8.5.sh


== ''blastermaster'' ==
== ''blastermaster'' ==


''blastermaster'' allows the generation of a specific docking configuration for a given receptor and ligand.
See: [[blastermaster (pydock3 script)]].
 
'''Note:''' Invoking ''blastermaster'' commands below will produce a log file called ''blastermaster.log'' in your current working directory.
 
=== ''init'' ===
 
First you need to create the file structure for your blastermaster job. To do so, simply type
 
pydock3 blastermaster - init
 
By default, the job directory is named ''blastermaster_job''. To specify a different name, type
 
pydock3 blastermaster - init <JOB_DIR_NAME>
 
The job directory contains two sub-directories:
# ''working'': input files, intermediate blaster files, sub-directories for individual blastermaster subroutines
# ''dockfiles'': output files (DOCK parameter files & INDOCK)
 
If your current working directory contains any of the following files, then they will be automatically copied into the working directory within the created job directory. This feature is intended to simplify the process of configuring the blastermaster job.
 
* ''rec.pdb''
* ''xtal-lig.pdb''
* ''rec.crg.pdb''
* ''reduce_wwPDB_het_dict.txt''
* ''filt.params''
* ''radii''
* ''amb.crg.oxt''
* ''vdw.siz''
* ''delphi.def''
* ''vdw.parms.amb.mindock''
* ''prot.table.ambcrg.ambH''
 
Only the following are required. Default versions / generated versions of the others will be used instead if they are not detected.
* ''rec.pdb''
* ''xtal-lig.pdb''
 
If you would like to use files not present in your current working directory, then copy them into your job's working directory, e.g.:
cp <FILE_PATH> <JOB_DIR_NAME>/working/
 
Finally, configure the ''blastermaster_config.yaml'' file in the job directory to your specifications. The parameters in this file govern the behavior of blastermaster.
 
=== ''run'' ===
 
Once your job has been configured to your liking, navigate to the the job directory and run blastermaster:
cd <JOB_DIR_NAME>
pydock3 blastermaster - run
 
This will execute the many blastermaster subroutines in sequence. The state of the program will be printed to standard output as it runs.


== ''dockopt'' ==
== ''dockopt'' ==


''dockopt'' allows the generation of many different docking configurations which are then evaluated & analyzed in parallel using a specified job scheduler (e.g. Slurm).
See: [[dockopt (pydock3 script)]].


The name "dockopt", aside from being an uncreative rehash of the name "blastermaster", derives from the notion of a literal dockopt, i.e., the person in charge of a dock who manages freight logistics and bosses around numerous dockworkers. In this analogy, a single dockworker corresponds to the processing of a single docking configuration.


'''Note:''' Invoking ''dockopt'' commands will produce a log file called ''dockopt.log'' in your current working directory.
== Note for UCSF Shoichet Lab members ==


=== ''init'' ===
''pydock3'' is already installed on the following clusters. You can source the provided Python environment scripts to expose the ''pydock3'' executable:
First you need to create the file structure for your dockopt job. To do so, simply type


pydock3 dockopt - init
=== Wynton ===


By default, the job directory is named ''dockopt_job''. To specify a different name, type
  source /wynton/group/bks/soft/python_envs/python3.8.5.sh


pydock3 dockopt - init <JOB_DIR_NAME>
=== Gimel ===


The job directory contains two sub-directories:
Only nodes other than ''gimel'' itself are supported, e.g., ''gimel5''.
# ''working'': input files, intermediate blaster files, sub-directories for individual blastermaster subroutines
# ''retrodock_jobs'': individual retrodock jobs for each docking configuration


The key difference between the working directories of ''blastermaster'' and ''dockopt'' is that the working directory of ''dockopt'' may contain multiple variants of the blaster files (prefixed by a number, e.g. "1_box"). These variant files are used to create the different docking configurations specified by the multi-valued entries of ''dockopt_config.yaml''. They are created efficiently, such that the same variant used in multiple docking configurations is not created more than once.
  ssh gimel5
 
  source /nfs/soft/ian/python3.8.5.sh
If your current working directory contains any of the following files, then they will be automatically copied into the working directory within the created job directory. This feature is intended to simplify the process of configuring the dockopt job.
 
* ''rec.pdb''
* ''xtal-lig.pdb''
* ''rec.crg.pdb''
* ''reduce_wwPDB_het_dict.txt''
* ''filt.params''
* ''radii''
* ''amb.crg.oxt''
* ''vdw.siz''
* ''delphi.def''
* ''vdw.parms.amb.mindock''
* ''prot.table.ambcrg.ambH''
 
Only the following are required. Default versions / generated versions of the others will be used instead if they are not detected.
* ''rec.pdb''
* ''xtal-lig.pdb''
 
If you would like to use files not present in your current working directory, copy them into your job's working directory, e.g.:
cp <FILE_PATH> <JOB_DIR_NAME>/working/
 
Finally, configure the ''dockopt_config.yaml'' file in the job directory to your specifications. The parameters in this file govern the behavior of dockopt.
 
'''Note:''' The ''dockopt_config.yaml'' file differs from the ''blastermaster_config.yaml'' file in that every parameter of the former may accept either a single value or a ''list of comma-separated values'', which indicates a pool of values to attempt for that parameter. Multiple such multi-valued parameters may be provided, and all unique resultant docking configurations will be attempted.
 
Single-valued YAML line format:
 
distance_to_surface: 1.0
 
Multi-valued YAML line format:
 
distance_to_surface: [1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9]
 
=== Environmental variables ===
 
Designate where the short cache and long cache should be located. E.g.:
 
export SHRTCACHE=/dev/shm  # temporary storage for job files
export LONGCACHE=/dev/shm # long-term storage for files shared between jobs
 
In order for ''dockopt'' to know which scheduler it should use, please configure the following environmental variables according to which one of the job schedulers you have.
 
==== Slurm ====
 
E.g., on the UCSF Shoichet Lab Gimel cluster (on any node other than 'gimel' itself, such as 'gimel5'):
 
  export SBATCH_EXEC=/usr/bin/sbatch
export SQUEUE_EXEC=/usr/bin/squeue
 
==== SGE ====
 
E.g., on the UCSF Wynton cluster:
 
export QSTAT_EXEC=/opt/sge/bin/lx-amd64/qstat
export QSUB_EXEC=/opt/sge/bin/lx-amd64/qsub
 
The following is necessary on the UCSF Wynton cluster:
 
export SGE_SETTINGS=/opt/sge/wynton/common/settings.sh
 
On most clusters, this will probably be:
export SGE_SETTINGS=/opt/sge/default/common/settings.sh
 
=== ''run'' ===
 
Once your job has been configured to your liking, navigate to the the job directory and run ''dockopt'':
cd <JOB_DIR_NAME>
pydock3 dockopt - run <JOB_SCHEDULER_NAME>
 
where <JOB_SCHEDULER_NAME> is one of:
* ''sge''
* ''slurm''
 
This will execute the many dockopt subroutines in sequence, except for the retrodock jobs run on each docking configuration, which are run in parallel via the scheduler. The state of the program will be printed to standard output as it runs.
 
You can also set the following flags to adjust retrodock job submission behavior. This example show the default values:
pydock3 dockopt - run <JOB_SCHEDULER_NAME> --retrodock_job_max_reattempts=0 --retrodock_job_timeout_minutes=None
 
Once the dockopt job is complete, the following files will be generated in the job directory:
* ''dockopt_job_report.pdf'': contains (1) roc.png of best retrodock job, (2) box plots of enrichment for every multi-valued config parameter, and (3) heatmaps of enrichment for every pair of multi-valued config parameters
* ''dockopt_job_results.csv'': enrichment metrics for each docking configuration


In addition, the best retrodock job will be copied to its own sub-directory ''best_retrodock_job/''.


Within each retrodock job directory, there are the following files and sub-directories:
[[Category:DOCK 3.8]]
* ''working/'': intermediate files
[[Category:DOCK Blaster]]
* ''dockfiles/'': parameters files and INDOCK for given docking configuration
* ''output/'': contains:
** joblist
** sub-directories ''1/'' for actives and ''2/'' for decoys (each containing OUTDOCK and test.mol2 files)
** log files for the retrodock jobs
* ''retrodock_job_results.csv'': data loaded from OUTDOCK files for both actives and decoys
* ''roc.png'': the ROC enrichment curve (log-scaled x-axis) for given docking configuration

Latest revision as of 01:51, 24 August 2023

pydock3 is a Python package wrapping the DOCK Fortran program that provides tools to help standardize and automate the computational methods employed in molecular docking. It is a natural successor to DOCK Blaster, originally published in 2009, and blastermaster.py, part of the DOCK 3.7 release in 2012.

Pydock3 logo.png

Scripts included in pydock3:

  • dockopt: generate many different docking configurations, perform retrospective docking on them in parallel using a specified job scheduler (e.g. Slurm), and analyze the results.
  • blastermaster: generate a specific docking configuration for a given receptor and ligand, intended for use by experts who wish to tune the docking model themselves. This is a direct successor of blastermaster.py from DOCK 3.7.

A docking configuration is a unique set of (1) DOCK parameter files (e.g., matching_spheres.sph), (2) an INDOCK file, and (3) a DOCK executable.

Installation

See: DOCK 3.8:How to install pydock3.

Instructions

blastermaster

See: blastermaster (pydock3 script).

dockopt

See: dockopt (pydock3 script).


Note for UCSF Shoichet Lab members

pydock3 is already installed on the following clusters. You can source the provided Python environment scripts to expose the pydock3 executable:

Wynton

 source /wynton/group/bks/soft/python_envs/python3.8.5.sh

Gimel

Only nodes other than gimel itself are supported, e.g., gimel5.

ssh gimel5
source /nfs/soft/ian/python3.8.5.sh