Global Matching Sphere Optimization: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:


== Goal ==
== Goal ==
Line 8: Line 6:
* heavy atoms of xtal-lig  
* heavy atoms of xtal-lig  
* spheres prepared by SPHGEN program
* spheres prepared by SPHGEN program
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the enrichment, RMSD of the docked pose to the experimental one. After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the  
 
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]),
* RMSD of the docked pose to the experimental one.  
 
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.


The program consists of two main modules:
The program consists of two main modules:
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.
* a Bash script, that watches created directory structure, runs docking and processes docking results
* a Bash script, that watches created directory structure, runs docking and processes docking results
== Setup ==
== Setup & Running ==
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.


The scripts and example config file are in <code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE</code>
=== Preparation ===
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.


Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.


=== Running ===
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:


Line 33: Line 40:


You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.
=== Processing results ===
The script will print the paths to where three best matching sphere sets are:
* best enrichment
* best RMSD
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).
You can use <code>dockfiles</code> from the listed directories.
You can also track the optimization progress running the following script in your working directory:
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py</code>
It produces <code>combined_metrics.png</code>

Revision as of 19:52, 1 May 2023

Goal

To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.

Description

The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:

  • heavy atoms of xtal-lig
  • spheres prepared by SPHGEN program

At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the

  • enrichment (normalized logAUC, see Ian's paper),
  • RMSD of the docked pose to the experimental one.

After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.

The program consists of two main modules:

  • a Python script (juggler.py) that performs MS generation, optimization, and ranking.
  • a Bash script, that watches created directory structure, runs docking and processes docking results

Setup & Running

So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.

The scripts and example config file are in /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE

Preparation

Prepare dockfiles directory with any tools of your liking (blastermaster, dockopt etc). You will also need rec.pdb, rec.crg.pdb, xtal-lig.pdb, ligands.names, decoys.names and a sdi directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as xtal-lig.pdb.

Prepare juggler_config.yml file. Put it into an empty directory.

Running

Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:

source /wynton/group/bks/soft/python_envs/python3.8.5.sh

python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py

You can detach from the screen (Ctrl-A d).

Open a new screen. In the same directory launch a docking daemon

/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh

You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.

The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.

Processing results

The script will print the paths to where three best matching sphere sets are:

  • best enrichment
  • best RMSD
  • best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).

You can use dockfiles from the listed directories.

You can also track the optimization progress running the following script in your working directory:

/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py

It produces combined_metrics.png