Global Matching Sphere Optimization: Difference between revisions

From DISI
Jump to navigation Jump to search
mNo edit summary
(updated for the new version)
Line 1: Line 1:
== Goal ==
To optimize your matching sphere (MS) setups getting faster docking and more high-scoring ligands with fewer spheres.


== Goal ==
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.
== Description ==
== Description ==
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:
The program performs optimization of matching spheres by pruning and stochastic optimization. It selects spheres from two sets:
* heavy atoms of xtal-lig  
* heavy atoms of xtal-lig  
* spheres prepared by SPHGEN program
* spheres prepared by SPHGEN program
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the


* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]),  
Juggler generates an initial MS set consisting of 100 spheres (maximum in DOCK 3.8). This set is used for retrospective docking, and then KDTree algorithm is used to prune the set to the required number of spheres by discarding all spheres that were not used in generation of the poses of the known binders ("actives"). This procedure is repeated to account for any differences in matching produced by reducing the MS set.
* RMSD of the docked pose to the experimental one.  
 
After this, the resulting set is transferred to the stepwise optimization procedure which conducts random perturbations of the sphere sets. Retrospective docking is done for each set, and sets are ranked by the


After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]),
* the average score of the top 1% of ligands.


The program consists of two main modules:
The program consists of two main modules:
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.
* a Bash script, that watches created directory structure, runs docking and processes docking results
* a Bash script (<code>rundockd.sh</code>), that watches created directory structure, runs docking and processes docking results.
 
== Setup & Running ==
== Setup & Running ==
So far, the program is running on Wynton and Gimel. LMK if you are interested in launching it on other clusters.
The scripts and example config file are in
Wynton
<code>/wynton/group/bks/soft/juggler</code>
Gimel
<code>/nfs/home/ak87/PROGRAM/juggler</code>


=== Setup ===
=== Setup ===
Install [https://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8 SUBDOCK]<syntaxhighlight lang="bash">
Dependencies:
git clone https://github.com/docking-org/SUBDOCK.git
* python 3.8.1 or higher
</syntaxhighlight>Install [https://wiki.docking.org/index.php/Docking_Analysis_in_DOCK3.8#top_poses.py top_poses.py]<syntaxhighlight lang="bash">
* [https://dock.docking.org/DOCK3.8/ DOCK3.8]
git clone https://github.com/docking-org/docktop.git
* [https://github.com/docking-org/pydock3 pydock3]
</syntaxhighlight>
* [https://github.com/docking-org/SUBDOCK subdock]
* rdkit
* pandas
* numpy
* yaml


=== Preparation ===
=== Preparation ===
What you need to prepare:
What you need to prepare:


* <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need
* <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc).
* <code>rec.pdb</code>
* <code>rec.crg.pdb</code>
* <code>rec.crg.pdb</code>  
* <code>xtal-lig.pdb</code>: To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.
* <code>xtal-lig.pdb</code>: To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>
* <code>ligands.names</code>
* <code>ligands.names</code>
* <code>decoys.names</code>
* <code>decoys.names</code>
* <code>sdi</code> file with the paths to ligand .tgz files..
* <code>sdi</code> file with the paths to ligand <code>.tgz</code> files.


Prepare <code>juggler_config.yml</code> file. Queue type is <code>sge</code> for Wynton and <code>slurm</code> for Gimel (newer machines, like gimel5/gimel2/n-1-XXX...). Put the config into an empty directory.<syntaxhighlight lang="yaml">
Prepare <code>juggler_config.yml</code> file. Put the config into an empty directory.<syntaxhighlight lang="yaml">
################################################
################################################
# Paths for your target
# Paths for your target
# NSP14 -- example
receptor_file_path: "/test/rec.crg.pdb"
receptor_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.pdb"
xtal_lig_file_path: "/test/xtal-lig.pdb"
rec_crg_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.crg.pdb"
dock_files_dir_path: "/test/dockfiles"
xtal_lig_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/TMP/xtal-lig-no-ring.pdb"
lig_names_file_path: "/test/ligands.names"
dock_files_dir_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/RECEPTOR/TYR368_0.2-ALA353_0.2-GLY333_0.2/ZINC611-XTAL/LSD/ALL-SPH/dockfiles"
dec_names_file_path: "/test/decoys.names"
lig_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands.names"
sdi_file_path: "test/ligands_sdi"
dec_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/decoys.names"
sdi_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands_repack_wynton_sdi"


################################################
################################################
# Executables and running
# Executables and running
dockbase: "/wynton/group/bks/soft/DOCK"
dockbase: "/path/to/DOCK"
subdock_bash_file_path: "/wynton/home/irwin/ak87/PROGRAM/SUBDOCK/subdock.bash"
dock64_bin: "path/to/dock64"
subdock_bash_file_path: "/path/to/subdock.bash"
queue_type: "sge" # "slurm" or "sge"
queue_type: "sge" # "slurm" or "sge"
top_poses_file: "/wynton/home/irwin/ak87/PROGRAM/docktop/top_poses.py"


###############################################
###############################################
Line 70: Line 62:
min_sph: 4 # min is 4
min_sph: 4 # min is 4
max_sph: 10 # max is 100
max_sph: 10 # max is 100
</syntaxhighlight>


###############################################
The <code>dock64_bin</code> parameter is optional; if absent, <code>{dockbase}/docking/DOCK/bin/dock64</code> will be used.
# Parameters for genetic algorithm
# Don't need to be adjusted for most purposes


# sample_size: 20 # Please, note that 1/4 of the sample size survives.
=== Running ===
# mutation_prob: 0.01
You can either:
# crossover_prob: 0.10
* Enter a screen environment so your run is not interrupted if you disconnect your SSH session, or
# go_to_next_gen_prob: 0.30
* Run Juggler using a queuing system. See example files for the slurm and sge below.
# gen_new_prob: 0.01


In both cases you need to launch Juggler and the docking daemon simultaneously.


###############################################
==== In a screen ====
# Parameters for sphere generation
<syntaxhighlight lang="bash">
# Don't need to be adjusted for most purposes
source /path/to/python/env
# or
conda activate pydock3
# rundockd should run in the background to manage docking jobs
sh rundockd.sh 2>&1 > rundockd.log &
python juggler.py 2>&1 > juggler.log
</syntaxhighlight>


# # changed to 1.0 for compatability with xtal-lig spheres
==== Via a queue ====
# close_dist: 1.0
# far_dist: 5.0
# min_sph: 4
# max_sph: 10
#


===== SGE =====
<syntaxhighlight lang="bash">
#! /bin/bash
#$ -cwd
#$ -q long.q
#$ -o stdout_juggler
#$ -e stdout_juggler
#$ -l s_rt=72:58:00
#$ -l h_rt=73:00:00
#$ -l mem_free=10G
#$ -pe smp 2
source /path/to/pydock3/env.sh
# or conda activate pydock3
sh /path/to/juggler/rundockd.sh 2>&1 > rundockd.log &
python /path/to/juggler/juggler.py 2>&1 > juggler.log
</syntaxhighlight>
</syntaxhighlight>


=== Running ===
===== SLURM =====
 
<syntaxhighlight lang="bash">
==== Launch Juggler: ====
#! /bin/bash
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:
#$ -cwd
 
#$ -q long.q
* Wynton
#$ -o stdout_juggler
 
#$ -e stdout_juggler
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code>
#$ -l s_rt=23:58:00
 
#$ -l h_rt=24:00:00
<code>python /wynton/group/bks/soft/juggler/juggler-v0.8.py 2&>1 > juggler.log</code>
#$ -l mem_free=10G
 
source /path/to/pydock3/env.sh
* Gimel (gimel2/gimel5/n-1-XXX ...)
# or conda activate pydock3
 
sh /path/to/juggler/rundockd.sh 2>&1 > rundockd.log &
<code>source /nfs/soft/ian/python3.8.5.sh</code>
python /path/to/juggler/juggler.py 2>&1 > juggler.log
 
</syntaxhighlight>
<code>python /nfs/home/ak87/PROGRAM/juggler/juggler-v0.8.py 2&>1 > juggler.log</code>
 
You can detach from the screen (Ctrl-A d).
 
==== Launch docking daemon😈: ====
Being in the same directory, open a new screen. Launch a docking daemon:
 
* Wynton
 
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code>
 
<code>sh /wynton/group/bks/soft/juggler/rundockd-v0.8.sh</code>
 
* Gimel (gimel2/gimel5/n-1-XXX ...)
 
<code>source /nfs/soft/ian/python3.8.5.sh</code>
 
<code>sh /nfs/home/ak87/PROGRAM/juggler/rundockd-v0.8.sh</code>
 
You can run other calculations in the meantime, as Juggler will track the task IDs that it launched.
 
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.


=== Processing results ===
=== Processing results ===
At the end of a run you will get a message that convergence was reachedThe script will print the paths to where three best matching sphere sets are:
At the end of a run you will get a message that convergence was reached. You will see the directory <code>best_set</code> that contains <code>dockfiles</code> and docking results for the best matching sphere set found. This directory is updated at each step, so if the run fails or convergence is not reached, you can still access the optimal set.


* best enrichment
Other output files:
* best RMSD
* <code>stepwise_opt_best_sets.dat</code> — lists the IDs and the nlogAUC values for the best set in each stepwise optimization round.
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).
* <code>stepwise_opt_metrics.dat</code> — lists IDs, nlogAUC, RMSD and average scores for the top 1% ligands for all sets tested during the stepwise optimization.
* optional: <code>juggler.log</code> or <code>stdout</code> — contains the log of the run.


You can use <code>dockfiles</code> from the listed directories.
=== For BKS lab users ===
==== Gimel ====
Juggler is in <code>/mnt/nfs/exa/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER</code>
Subdock is in <code>/mnt/nfs/exa/work/ak87/PROGRAM/SUBDOCK/SUBDOCK</code>
You can use this file to submit to SLURM queue
<syntaxhighlight lang="bash">
#! /bin/bash
#$ -cwd
#$ -q long.q
#$ -o stdout_juggler
#$ -e stdout_juggler
#$ -l s_rt=23:58:00
#$ -l h_rt=24:00:00
#$ -l mem_free=10G
source /nfs/soft/ian/python3.8.5.sh
sh /mnt/nfs/exa/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/rundockd.sh 2>&1 > rundockd.log & #/dev/null &
python /mnt/nfs/exa/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/juggler.py 2>&1 > orbebb.log
</syntaxhighlight>


You can also track the optimization progress running the following script in your working directory:
==== Wynton ====
 
Juggler is in <code>/wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER</code>
* Wynton
Subdock is in <code>/wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/SUBDOCK</code>
 
You can use this file to submit to SLURM queue
<code>/wynton/group/bks/soft/juggler/plot_all_metrics.py</code>
<syntaxhighlight lang="bash">
 
#! /bin/bash
* Gimel
#$ -cwd
 
#$ -q long.q
<code>/nfs/home/ak87/PROGRAM/juggler/plot_all_metrics.py</code>
#$ -o stdout_juggler
 
#$ -e stdout_juggler
It produces <code>combined_metrics.png</code>
#$ -l s_rt=72:58:00
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]
#$ -l h_rt=73:00:00
 
#$ -l mem_free=10G
'''If not converged'''
#$ -pe smp 2
 
source /wynton/group/bks/soft/python_envs/env.sh
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:
sh /wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/rundockd.sh 2>&1 > rundockd.log & #/dev/null &
 
python /wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/juggler.py 2>&1 > orbebb.log
<code>Generations Set# NormLogAUC RMSD Nsph Combined_metrics</code>
</syntaxhighlight>
 
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.

Revision as of 22:16, 17 April 2026

Goal

To optimize your matching sphere (MS) setups getting faster docking and more high-scoring ligands with fewer spheres.

Description

The program performs optimization of matching spheres by pruning and stochastic optimization. It selects spheres from two sets:

  • heavy atoms of xtal-lig
  • spheres prepared by SPHGEN program

Juggler generates an initial MS set consisting of 100 spheres (maximum in DOCK 3.8). This set is used for retrospective docking, and then KDTree algorithm is used to prune the set to the required number of spheres by discarding all spheres that were not used in generation of the poses of the known binders ("actives"). This procedure is repeated to account for any differences in matching produced by reducing the MS set.

After this, the resulting set is transferred to the stepwise optimization procedure which conducts random perturbations of the sphere sets. Retrospective docking is done for each set, and sets are ranked by the

  • enrichment (normalized logAUC, see Ian's paper),
  • the average score of the top 1% of ligands.

The program consists of two main modules:

  • a Python script (juggler.py) that performs MS generation, optimization, and ranking.
  • a Bash script (rundockd.sh), that watches created directory structure, runs docking and processes docking results.

Setup & Running

Setup

Dependencies:

Preparation

What you need to prepare:

  • dockfiles directory with any tools of your liking (blastermaster, dockopt etc).
  • rec.crg.pdb
  • xtal-lig.pdb: To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as xtal-lig.pdb.
  • ligands.names
  • decoys.names
  • sdi file with the paths to ligand .tgz files.

Prepare juggler_config.yml file. Put the config into an empty directory.

################################################
# Paths for your target
receptor_file_path: "/test/rec.crg.pdb"
xtal_lig_file_path: "/test/xtal-lig.pdb"
dock_files_dir_path: "/test/dockfiles"
lig_names_file_path: "/test/ligands.names"
dec_names_file_path: "/test/decoys.names"
sdi_file_path: "test/ligands_sdi"

################################################
# Executables and running
dockbase: "/path/to/DOCK"
dock64_bin: "path/to/dock64"
subdock_bash_file_path: "/path/to/subdock.bash"
queue_type: "sge" # "slurm" or "sge"

###############################################
# Max and min number of spheres
min_sph: 4 # min is 4
max_sph: 10 # max is 100

The dock64_bin parameter is optional; if absent, {dockbase}/docking/DOCK/bin/dock64 will be used.

Running

You can either:

  • Enter a screen environment so your run is not interrupted if you disconnect your SSH session, or
  • Run Juggler using a queuing system. See example files for the slurm and sge below.

In both cases you need to launch Juggler and the docking daemon simultaneously.

In a screen

source /path/to/python/env
# or
conda activate pydock3
# rundockd should run in the background to manage docking jobs
sh rundockd.sh 2>&1 > rundockd.log &
python juggler.py 2>&1 > juggler.log

Via a queue

SGE
#! /bin/bash
#$ -cwd
#$ -q long.q
#$ -o stdout_juggler
#$ -e stdout_juggler
#$ -l s_rt=72:58:00
#$ -l h_rt=73:00:00
#$ -l mem_free=10G
#$ -pe smp 2
source /path/to/pydock3/env.sh
# or conda activate pydock3
sh /path/to/juggler/rundockd.sh 2>&1 > rundockd.log &
python /path/to/juggler/juggler.py 2>&1 > juggler.log
SLURM
#! /bin/bash
#$ -cwd
#$ -q long.q
#$ -o stdout_juggler
#$ -e stdout_juggler
#$ -l s_rt=23:58:00
#$ -l h_rt=24:00:00
#$ -l mem_free=10G
source /path/to/pydock3/env.sh
# or conda activate pydock3
sh /path/to/juggler/rundockd.sh 2>&1 > rundockd.log &
python /path/to/juggler/juggler.py 2>&1 > juggler.log

Processing results

At the end of a run you will get a message that convergence was reached. You will see the directory best_set that contains dockfiles and docking results for the best matching sphere set found. This directory is updated at each step, so if the run fails or convergence is not reached, you can still access the optimal set.

Other output files:

  • stepwise_opt_best_sets.dat — lists the IDs and the nlogAUC values for the best set in each stepwise optimization round.
  • stepwise_opt_metrics.dat — lists IDs, nlogAUC, RMSD and average scores for the top 1% ligands for all sets tested during the stepwise optimization.
  • optional: juggler.log or stdout — contains the log of the run.

For BKS lab users

Gimel

Juggler is in /mnt/nfs/exa/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER Subdock is in /mnt/nfs/exa/work/ak87/PROGRAM/SUBDOCK/SUBDOCK You can use this file to submit to SLURM queue

#! /bin/bash
#$ -cwd
#$ -q long.q
#$ -o stdout_juggler
#$ -e stdout_juggler
#$ -l s_rt=23:58:00
#$ -l h_rt=24:00:00
#$ -l mem_free=10G
source /nfs/soft/ian/python3.8.5.sh
sh /mnt/nfs/exa/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/rundockd.sh 2>&1 > rundockd.log & #/dev/null &
python /mnt/nfs/exa/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/juggler.py 2>&1 > orbebb.log

Wynton

Juggler is in /wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER Subdock is in /wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/SUBDOCK You can use this file to submit to SLURM queue

#! /bin/bash
#$ -cwd
#$ -q long.q
#$ -o stdout_juggler
#$ -e stdout_juggler
#$ -l s_rt=72:58:00
#$ -l h_rt=73:00:00
#$ -l mem_free=10G
#$ -pe smp 2
source /wynton/group/bks/soft/python_envs/env.sh
sh /wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/rundockd.sh 2>&1 > rundockd.log & #/dev/null &
python /wynton/group/bks/work/ak87/UCSF/JUGGLER/SCRIPTS/JUGGLER/juggler.py 2>&1 > orbebb.log