http://wiki.docking.org/api.php?action=feedcontributions&user=Iamkaant&feedformat=atomDISI - User contributions [en]2024-03-29T01:13:21ZUser contributionsMediaWiki 1.39.1http://wiki.docking.org/index.php?title=Extended_Search_of_Analogs_via_Bioisosteric_Replacements&diff=15534Extended Search of Analogs via Bioisosteric Replacements2023-09-21T06:41:17Z<p>Iamkaant: disclaimer added</p>
<hr />
<div>=== Rationale ===<br />
Our standard pipeline of searching for analogs (at least as I know it) consists of entering the SMILES of a molecule of interest into all flavors of SmallWorld and Arthor, available in the lab. This procedure has two drawbacks:<br />
<br />
# Limited diversity. SmallWorld's and Arthor's measure of distance between the analogs and the parent compound is graph edit distance.<ref>I am probably using slightly wrong terminology here, so be it. You can learn more about it from many marvelous Nextmove's presentations, like this one: https://www.nextmovesoftware.com/talks/Sayle_SmallWorld_Oxford_202003.pdf</ref> This metric, while useful and robust, is somewhat different from a chemist's idea of similarity. For example, the graph edit distance between benzene and cyclohexane is 6. It is quite far, and normally we do not consider such distant analogs. But as a part of a lead-like molecule, these two rings may replace each other in certain cases, without the loss of biological activity of the whole compound. <br />
# Time investments. Manual search in the databases takes quite some time, especially if you need to find analogs for many compounds. <br />
<br />
I wanted to create an automated procedure for analog searching. SmallWorld API is perfectly suitable for that, although sometimes unstable. But to overcome the issue of limited diversity, I decided to use the bioisosteric replacement program, which is currently being developed by Maksim Tsukanov.<br />
<br />
=== How it works ===<br />
The pipeline for the extended analog search works in two steps:<br />
<br />
# Create bioisosteres of the original molecule (method created by Maksim Tsukanov, currently under development)<br />
# Search for their closest analogs in SmallWorld (distance up to 2)<br />
<br />
=== How to use ===<br />
'''DISCLAIMER''': The Bioisostere pipeline is under development, which means its ability to yield results is not assured. SmallWorld API is unstable sometimes. Every request is retried 4 times if unsuccessful, but it may still not return results in certain cases. The exhaustive search for analogs is not guaranteed.<br />
<br />
The scripts are currently available on Gimel only. Once the bioisostere program is published, running the whole pipeline on any Linux/MacOS machine will be possible. <br />
<br />
All scripts are deposited in <code>~ak87/PROGRAM/ANALOGS</code><br />
<br />
To look for analogs, do the following:<br />
<br />
# Log on to Gimel<br />
# ssh to any of the newer machines (Gimel5, epyc, n-1-XX...)<br />
# Prepare a file with <code>SMILES name</code>, separated by a tab. So far, I've tested the pipeline with one compound at a time. In theory, you should be able to enter as many compounds as you like, but the analogs will be mixed up.<br />
# <code>sh ~ak87/PROGRAM/ANALOGS/analog-search.sh <input.smi></code><br />
<br />
The run should take about 10-20 minutes, depending on the size of your molecule and its "popularity" in the commercial databases. I deliberately did not make requests to the databases parallel in order to omit overloading the API. <br />
<br />
The list of analogs will be stored in <code>final_analogs.smi</code>. The format is <code>SMILES ID Distance</code><br />
<br />
You can also run any of the stages separately. To perform a bulk SmallWorld search, run sh <code>~ak87/PROGRAM/ANALOGS/bulk-analogs-bioisostere-sw1.sh <input.smi></code> Currently, it only searches for the analogs with a distance of up to 2, but you can copy the script and modify it as you like.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Extended_Search_of_Analogs_via_Bioisosteric_Replacements&diff=15533Extended Search of Analogs via Bioisosteric Replacements2023-09-21T06:16:12Z<p>Iamkaant: Created page with "=== Rationale === Our standard pipeline of searching for analogs (at least as I know it) consists of entering the SMILES of a molecule of interest into all flavors of SmallWorld and Arthor, available in the lab. This procedure has two drawbacks: # Limited diversity. SmallWorld's and Arthor's measure of distance between the analogs and the parent compound is graph edit distance.<ref>I am probably using slightly wrong terminology here, so be it. You can learn more about..."</p>
<hr />
<div>=== Rationale ===<br />
Our standard pipeline of searching for analogs (at least as I know it) consists of entering the SMILES of a molecule of interest into all flavors of SmallWorld and Arthor, available in the lab. This procedure has two drawbacks:<br />
<br />
# Limited diversity. SmallWorld's and Arthor's measure of distance between the analogs and the parent compound is graph edit distance.<ref>I am probably using slightly wrong terminology here, so be it. You can learn more about it from many marvelous Nextmove's presentations, like this one: https://www.nextmovesoftware.com/talks/Sayle_SmallWorld_Oxford_202003.pdf</ref> This metric, while useful and robust, is somewhat different from a chemist's idea of similarity. For example, the graph edit distance between benzene and cyclohexane is 6. It is quite far, and normally we do not consider such distant analogs. But as a part of a lead-like molecule, these two rings may replace each other in certain cases, without the loss of biological activity of the whole compound. <br />
# Time investments. Manual search in the databases takes quite some time, especially if you need to find analogs for many compounds. <br />
<br />
I wanted to create an automated procedure for analog searching. SmallWorld API is perfectly suitable for that, although sometimes unstable. But to overcome the issue of limited diversity, I decided to use the bioisosteric replacement program, which is currently being developed by Maksim Tsukanov.<br />
<br />
=== How it works ===<br />
The pipeline for the extended analog search works in two steps:<br />
<br />
# Create bioisosteres of the original molecule (method created by Maksim Tsukanov, currently under development)<br />
# Search for their closest analogs in SmallWorld (distance up to 2)<br />
<br />
=== How to use ===<br />
The scripts are currently available on Gimel only. Once the bioisostere program is published, running the whole pipeline on any Linux/MacOS machine will be possible. <br />
<br />
All scripts are deposited in <code>~ak87/PROGRAM/ANALOGS</code><br />
<br />
To look for analogs, do the following:<br />
<br />
# Log on to Gimel<br />
# ssh to any of the newer machines (Gimel5, epyc, n-1-XX...)<br />
# Prepare a file with <code>SMILES name</code>, separated by a tab. So far, I've tested the pipeline with one compound at a time. In theory, you should be able to enter as many compounds as you like, but the analogs will be mixed up.<br />
# <code>sh ~ak87/PROGRAM/ANALOGS/analog-search.sh <input.smi></code><br />
<br />
The run should take about 10-20 minutes, depending on the size of your molecule and its "popularity" in the commercial databases. I deliberately did not make requests to the databases parallel in order to omit overloading the API. <br />
<br />
The list of analogs will be stored in <code>final_analogs.smi</code>. The format is <code>SMILES ID Distance</code><br />
<br />
You can also run any of the stages separately. To perform a bulk SmallWorld search, run sh <code>~ak87/PROGRAM/ANALOGS/bulk-analogs-bioisostere-sw1.sh <input.smi></code> Currently, it only searches for the analogs with a distance of up to 2, but you can copy the script and modify it as you like.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15465Global Matching Sphere Optimization2023-07-20T00:20:02Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton and Gimel. LMK if you are interested in launching it on other clusters.<br />
<br />
The scripts and example config file are in<br />
<br />
Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler</code><br />
<br />
Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler</code><br />
<br />
=== Setup ===<br />
Install [https://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8 SUBDOCK]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/SUBDOCK.git<br />
</syntaxhighlight>Install [https://wiki.docking.org/index.php/Docking_Analysis_in_DOCK3.8#top_poses.py top_poses.py]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/docktop.git<br />
</syntaxhighlight><br />
<br />
=== Preparation ===<br />
What you need to prepare:<br />
<br />
* <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <br />
* <code>rec.pdb</code><br />
* <code>rec.crg.pdb</code> <br />
* <code>xtal-lig.pdb</code>: To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code><br />
* <code>ligands.names</code><br />
* <code>decoys.names</code><br />
* <code>sdi</code> file with the paths to ligand .tgz files..<br />
<br />
Prepare <code>juggler_config.yml</code> file. Queue type is <code>sge</code> for Wynton and <code>slurm</code> for Gimel (newer machines, like gimel5/gimel2/n-1-XXX...). Put the config into an empty directory.<syntaxhighlight lang="yaml"><br />
################################################<br />
# Paths for your target<br />
# NSP14 -- example<br />
receptor_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.pdb"<br />
rec_crg_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.crg.pdb"<br />
xtal_lig_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/TMP/xtal-lig-no-ring.pdb"<br />
dock_files_dir_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/RECEPTOR/TYR368_0.2-ALA353_0.2-GLY333_0.2/ZINC611-XTAL/LSD/ALL-SPH/dockfiles"<br />
lig_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands.names"<br />
dec_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/decoys.names"<br />
sdi_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands_repack_wynton_sdi"<br />
<br />
################################################<br />
# Executables and running<br />
dockbase: "/wynton/group/bks/soft/DOCK"<br />
subdock_bash_file_path: "/wynton/home/irwin/ak87/PROGRAM/SUBDOCK/subdock.bash"<br />
queue_type: "sge" # "slurm" or "sge"<br />
top_poses_file: "/wynton/home/irwin/ak87/PROGRAM/docktop/top_poses.py"<br />
<br />
###############################################<br />
# Max and min number of spheres<br />
min_sph: 4 # min is 4<br />
max_sph: 10 # max is 100<br />
<br />
###############################################<br />
# Parameters for genetic algorithm<br />
# Don't need to be adjusted for most purposes<br />
<br />
# sample_size: 20 # Please, note that 1/4 of the sample size survives.<br />
# mutation_prob: 0.01<br />
# crossover_prob: 0.10<br />
# go_to_next_gen_prob: 0.30<br />
# gen_new_prob: 0.01<br />
<br />
<br />
###############################################<br />
# Parameters for sphere generation<br />
# Don't need to be adjusted for most purposes<br />
<br />
# # changed to 1.0 for compatability with xtal-lig spheres<br />
# close_dist: 1.0<br />
# far_dist: 5.0<br />
# min_sph: 4<br />
# max_sph: 10<br />
#<br />
<br />
</syntaxhighlight><br />
<br />
=== Running ===<br />
<br />
==== Launch Juggler: ====<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
* Wynton<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/group/bks/soft/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
* Gimel (gimel2/gimel5/n-1-XXX ...)<br />
<br />
<code>source /nfs/soft/ian/python3.8.5.sh</code><br />
<br />
<code>python /nfs/home/ak87/PROGRAM/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
==== Launch docking daemon😈: ====<br />
Being in the same directory, open a new screen. Launch a docking daemon:<br />
<br />
* Wynton<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>sh /wynton/group/bks/soft/juggler/rundockd-v0.8.sh</code><br />
<br />
* Gimel (gimel2/gimel5/n-1-XXX ...)<br />
<br />
<code>source /nfs/soft/ian/python3.8.5.sh</code><br />
<br />
<code>sh /nfs/home/ak87/PROGRAM/juggler/rundockd-v0.8.sh</code><br />
<br />
You can run other calculations in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
At the end of a run you will get a message that convergence was reachedThe script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
* Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler/plot_all_metrics.py</code><br />
<br />
* Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
<br />
<code>Generations Set# NormLogAUC RMSD Nsph Combined_metrics</code><br />
<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15464Global Matching Sphere Optimization2023-07-20T00:19:13Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton and Gimel. LMK if you are interested in launching it on other clusters.<br />
<br />
The scripts and example config file are in<br />
<br />
Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler</code><br />
<br />
Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler</code><br />
<br />
=== Setup ===<br />
Install [https://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8 SUBDOCK]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/SUBDOCK.git<br />
</syntaxhighlight>Install [https://wiki.docking.org/index.php/Docking_Analysis_in_DOCK3.8#top_poses.py top_poses.py]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/docktop.git<br />
</syntaxhighlight><br />
<br />
=== Preparation ===<br />
What you need to prepare:<br />
<br />
* <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <br />
* <code>rec.pdb</code><br />
* <code>rec.crg.pdb</code> <br />
* <code>xtal-lig.pdb</code>: To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code><br />
* <code>ligands.names</code><br />
* <code>decoys.names</code><br />
* <code>sdi</code> file with the paths to ligand .tgz files..<br />
<br />
Prepare <code>juggler_config.yml</code> file. Queue type is <code>sge</code> for Wynton and <code>slurm</code> for Gimel (newer machines, like gimel5/gimel2/n-1-XXX...). Put the config into an empty directory.<syntaxhighlight lang="yaml"><br />
################################################<br />
# Paths for your target<br />
# NSP14 -- example<br />
receptor_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.pdb"<br />
rec_crg_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.crg.pdb"<br />
xtal_lig_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/TMP/xtal-lig-no-ring.pdb"<br />
dock_files_dir_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/RECEPTOR/TYR368_0.2-ALA353_0.2-GLY333_0.2/ZINC611-XTAL/LSD/ALL-SPH/dockfiles"<br />
lig_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands.names"<br />
dec_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/decoys.names"<br />
sdi_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands_repack_wynton_sdi"<br />
<br />
################################################<br />
# Executables and running<br />
dockbase: "/wynton/group/bks/soft/DOCK"<br />
subdock_bash_file_path: "/wynton/home/irwin/ak87/PROGRAM/SUBDOCK/subdock.bash"<br />
queue_type: "sge" # "slurm" or "sge"<br />
top_poses_file: "/wynton/home/irwin/ak87/PROGRAM/docktop/top_poses.py"<br />
<br />
###############################################<br />
# Max and min number of spheres<br />
min_sph: 4 # min is 4<br />
max_sph: 10 # max is 100<br />
<br />
###############################################<br />
# Parameters for genetic algorithm<br />
# Don't need to be adjusted for most purposes<br />
<br />
# sample_size: 20 # Please, note that 1/4 of the sample size survives.<br />
# mutation_prob: 0.01<br />
# crossover_prob: 0.10<br />
# go_to_next_gen_prob: 0.30<br />
# gen_new_prob: 0.01<br />
<br />
<br />
###############################################<br />
# Parameters for sphere generation<br />
# Don't need to be adjusted for most purposes<br />
<br />
# # changed to 1.0 for compatability with xtal-lig spheres<br />
# close_dist: 1.0<br />
# far_dist: 5.0<br />
# min_sph: 4<br />
# max_sph: 10<br />
#<br />
<br />
</syntaxhighlight><br />
<br />
=== Running ===<br />
<br />
==== Launch Juggler: ====<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
* Wynton<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/group/bks/soft/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
* Gimel (gimel2/gimel5/n-1-XXX ...)<br />
<br />
<code>ource /nfs/soft/ian/python3.8.5.sh</code><br />
<br />
<code>python /nfs/home/ak87/PROGRAM/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
==== Launch docking daemon😈: ====<br />
Being in the same directory, open a new screen. Launch a docking daemon:<br />
<br />
* Wynton<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>sh /wynton/group/bks/soft/juggler/rundockd-v0.8.sh</code><br />
<br />
* Gimel (gimel2/gimel5/n-1-XXX ...)<br />
<br />
<code>ource /nfs/soft/ian/python3.8.5.sh</code><br />
<br />
<code>sh /nfs/home/ak87/PROGRAM/juggler/rundockd-v0.8.sh</code><br />
<br />
You can run other calculations in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
At the end of a run you will get a message that convergence was reachedThe script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
* Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler/plot_all_metrics.py</code><br />
<br />
* Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
<br />
<code>Generations Set# NormLogAUC RMSD Nsph Combined_metrics</code><br />
<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15463Global Matching Sphere Optimization2023-07-20T00:16:39Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton and Gimel. LMK if you are interested in launching it on other clusters.<br />
<br />
The scripts and example config file are in<br />
<br />
Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler</code><br />
<br />
Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler</code><br />
<br />
=== Setup ===<br />
Install [https://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8 SUBDOCK]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/SUBDOCK.git<br />
</syntaxhighlight>Install [https://wiki.docking.org/index.php/Docking_Analysis_in_DOCK3.8#top_poses.py top_poses.py]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/docktop.git<br />
</syntaxhighlight><br />
<br />
=== Preparation ===<br />
What you need to prepare:<br />
<br />
* <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <br />
* <code>rec.pdb</code><br />
* <code>rec.crg.pdb</code> <br />
* <code>xtal-lig.pdb</code>: To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code><br />
* <code>ligands.names</code><br />
* <code>decoys.names</code><br />
* <code>sdi</code> file with the paths to ligand .tgz files..<br />
<br />
Prepare <code>juggler_config.yml</code> file. Queue type is <code>sge</code> for Wynton and <code>slurm</code> for Gimel (newer machines, like gimel5/gimel2/n-1-XXX...). Put the config into an empty directory.<syntaxhighlight lang="yaml"><br />
################################################<br />
# Paths for your target<br />
# NSP14 -- example<br />
receptor_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.pdb"<br />
rec_crg_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.crg.pdb"<br />
xtal_lig_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/TMP/xtal-lig-no-ring.pdb"<br />
dock_files_dir_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/RECEPTOR/TYR368_0.2-ALA353_0.2-GLY333_0.2/ZINC611-XTAL/LSD/ALL-SPH/dockfiles"<br />
lig_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands.names"<br />
dec_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/decoys.names"<br />
sdi_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands_repack_wynton_sdi"<br />
<br />
################################################<br />
# Executables and running<br />
dockbase: "/wynton/group/bks/soft/DOCK"<br />
subdock_bash_file_path: "/wynton/home/irwin/ak87/PROGRAM/SUBDOCK/subdock.bash"<br />
queue_type: "sge" # "slurm" or "sge"<br />
top_poses_file: "/wynton/home/irwin/ak87/PROGRAM/docktop/top_poses.py"<br />
<br />
###############################################<br />
# Max and min number of spheres<br />
min_sph: 4 # min is 4<br />
max_sph: 10 # max is 100<br />
<br />
###############################################<br />
# Parameters for genetic algorithm<br />
# Don't need to be adjusted for most purposes<br />
<br />
# sample_size: 20 # Please, note that 1/4 of the sample size survives.<br />
# mutation_prob: 0.01<br />
# crossover_prob: 0.10<br />
# go_to_next_gen_prob: 0.30<br />
# gen_new_prob: 0.01<br />
<br />
<br />
###############################################<br />
# Parameters for sphere generation<br />
# Don't need to be adjusted for most purposes<br />
<br />
# # changed to 1.0 for compatability with xtal-lig spheres<br />
# close_dist: 1.0<br />
# far_dist: 5.0<br />
# min_sph: 4<br />
# max_sph: 10<br />
#<br />
<br />
</syntaxhighlight><br />
<br />
=== Running ===<br />
<br />
==== Launch Juggler: ====<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
* Wynton<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/group/bks/soft/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
* Gimel<br />
<br />
<code>ource /nfs/soft/ian/python3.8.5.sh</code><br />
<br />
<code>python /nfs/home/ak87/PROGRAM/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
==== Launch docking daemon😈: ====<br />
Being in the same directory, open a new screen. Launch a docking daemon:<br />
<br />
* Wynton<br />
<br />
<code>sh /wynton/group/bks/soft/juggler/rundockd-v0.8.sh</code><br />
<br />
* Gimel<br />
<br />
<code>sh /nfs/home/ak87/PROGRAM/juggler/rundockd-v0.8.sh</code><br />
<br />
You can run other calculations in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
At the end of a run you will get a message that convergence was reachedThe script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
* Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler/plot_all_metrics.py</code><br />
<br />
* Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
<br />
<code>Generations Set# NormLogAUC RMSD Nsph Combined_metrics</code><br />
<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15462Global Matching Sphere Optimization2023-07-17T16:04:16Z<p>Iamkaant: updated to describe the new version of Juggler</p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton and Gimel. LMK if you are interested in launching it on other clusters.<br />
<br />
The scripts and example config file are in<br />
<br />
Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler</code><br />
<br />
Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler</code><br />
<br />
=== Setup ===<br />
Install [https://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8 SUBDOCK]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/SUBDOCK.git<br />
</syntaxhighlight>Install [https://wiki.docking.org/index.php/Docking_Analysis_in_DOCK3.8#top_poses.py top_poses.py]<syntaxhighlight lang="bash"><br />
git clone https://github.com/docking-org/docktop.git<br />
</syntaxhighlight><br />
<br />
=== Preparation ===<br />
What you need to prepare:<br />
<br />
* <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <br />
* <code>rec.pdb</code><br />
* <code>rec.crg.pdb</code> <br />
* <code>xtal-lig.pdb</code>: To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code><br />
* <code>ligands.names</code><br />
* <code>decoys.names</code><br />
* <code>sdi</code> file with the paths to ligand .tgz files..<br />
<br />
Prepare <code>juggler_config.yml</code> file. Queue type is <code>sge</code> for Wynton and <code>slurm</code> for Gimel (newer machines, like gimel5/gimel2/n-1-XXX...). Put the config into an empty directory.<syntaxhighlight lang="yaml"><br />
################################################<br />
# Paths for your target<br />
# NSP14 -- example<br />
receptor_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.pdb"<br />
rec_crg_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/rec.crg.pdb"<br />
xtal_lig_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/GENERATION/TMP/xtal-lig-no-ring.pdb"<br />
dock_files_dir_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/RECEPTOR/TYR368_0.2-ALA353_0.2-GLY333_0.2/ZINC611-XTAL/LSD/ALL-SPH/dockfiles"<br />
lig_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands.names"<br />
dec_names_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/decoys.names"<br />
sdi_file_path: "/wynton/home/irwin/ak87/ak87/UCSF/NSP14/LIGANDS/REPACK/ligands_repack_wynton_sdi"<br />
<br />
################################################<br />
# Executables and running<br />
dockbase: "/wynton/group/bks/soft/DOCK"<br />
subdock_bash_file_path: "/wynton/home/irwin/ak87/PROGRAM/SUBDOCK/subdock.bash"<br />
queue_type: "sge" # "slurm" or "sge"<br />
top_poses_file: "/wynton/home/irwin/ak87/PROGRAM/docktop/top_poses.py"<br />
<br />
###############################################<br />
# Max and min number of spheres<br />
min_sph: 4 # min is 4<br />
max_sph: 10 # max is 100<br />
<br />
###############################################<br />
# Parameters for genetic algorithm<br />
# Don't need to be adjusted for most purposes<br />
<br />
# sample_size: 20 # Please, note that 1/4 of the sample size survives.<br />
# mutation_prob: 0.01<br />
# crossover_prob: 0.10<br />
# go_to_next_gen_prob: 0.30<br />
# gen_new_prob: 0.01<br />
<br />
<br />
###############################################<br />
# Parameters for sphere generation<br />
# Don't need to be adjusted for most purposes<br />
<br />
# # changed to 1.0 for compatability with xtal-lig spheres<br />
# close_dist: 1.0<br />
# far_dist: 5.0<br />
# min_sph: 4<br />
# max_sph: 10<br />
#<br />
<br />
</syntaxhighlight><br />
<br />
=== Running ===<br />
<br />
==== Launch Juggler: ====<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
* Wynton<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/group/bks/soft/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
* Gimel<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /nfs/home/ak87/PROGRAM/juggler/juggler-v0.8.py 2&>1 > juggler.log</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
==== Launch docking daemon😈: ====<br />
Being in the same directory, open a new screen. Launch a docking daemon:<br />
<br />
* Wynton<br />
<br />
<code>sh /wynton/group/bks/soft/juggler/rundockd-v0.8.sh</code><br />
<br />
* Gimel<br />
<br />
<code>sh /nfs/home/ak87/PROGRAM/juggler/rundockd-v0.8.sh</code><br />
<br />
You can run other calculations in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
At the end of a run you will get a message that convergence was reachedThe script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
* Wynton<br />
<br />
<code>/wynton/group/bks/soft/juggler/plot_all_metrics.py</code><br />
<br />
* Gimel<br />
<br />
<code>/nfs/home/ak87/PROGRAM/juggler/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
<br />
<code>Generations Set# NormLogAUC RMSD Nsph Combined_metrics</code><br />
<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=15413Membrane Modeling2023-05-31T21:36:51Z<p>Iamkaant: </p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
''''' NPT ensemble can safely be used instead of NgPT'''''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Export <code>$SCHRODINGER</code> variable:<br />
<br />
<code>SCHRODINGER=/nfs/soft2/schrodinger/2023-3/</code> (or a newer version)<br />
<br />
Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Positioning of the membrane ==<br />
If there is no precomputed membrane position from OPM or MemProtMD, you can model it using PPM 3.0 webserver (https://opm.phar.umich.edu/ppm_server3) or standalone software. The software is installed in <code>/nfs/home/ak87/exa/PROGRAM/OPM-MEMBRANE</code>. Copy <code>res.lib</code> file from the program directory to the directory with your .pdb file. Create input file like this:<syntaxhighlight lang="shell"><br />
1<br />
0 PMm out rec.pdb<br />
</syntaxhighlight><blockquote>0 or 1 -“do not use” or “use” heteroatoms in the input PDB file, respectively (solvent molecules are always excluded).<br />
<br />
MOM - type of membrane (see list of 3-letter codes for membranes below)<br />
<br />
“in” or “out” means topology of N-terminus of first subunit included in the corresponding input PDB file<br />
<br />
With this option, for every input pdb file, the program will selected automatically the flat or curved membrane boundaries, whichever had the lower calculated transfer energy.</blockquote>Then run the program:<br />
<br />
<code>~ak87/exa/PROGRAM/OPM-MEMBRANE/immers<1membrane.inp>rec-opm.out</code><br />
<br />
Use <code>datasub1</code> file to extract residue numbers for the membrane placement in Maestro.<br />
See ppm3_instructions.docx file in the program directory for more detail.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=15412Membrane Modeling2023-05-31T21:27:25Z<p>Iamkaant: </p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
'' NPT ensemble can safely be used instead of NgPT''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Export <code>$SCHRODINGER</code> variable:<br />
<br />
<code>SCHRODINGER=/nfs/soft2/schrodinger/2023-3/</code> (or a newer version)<br />
<br />
Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Positioning of the membrane ==<br />
If there is no precomputed membrane position from OPM or MemProtMD, you can model it using PPM 3.0 webserver (https://opm.phar.umich.edu/ppm_server3) or standalone software. The software is installed in <code>/nfs/home/ak87/exa/PROGRAM/OPM-MEMBRANE</code>. Copy <code>res.lib</code> file from the program directory to the directory with your .pdb file. Create input file like this:<syntaxhighlight lang="shell"><br />
1<br />
0 PMm out rec.pdb<br />
</syntaxhighlight><blockquote>0 or 1 -“do not use” or “use” heteroatoms in the input PDB file, respectively (solvent molecules are always excluded).<br />
<br />
MOM - type of membrane (see list of 3-letter codes for membranes below)<br />
<br />
“in” or “out” means topology of N-terminus of first subunit included in the corresponding input PDB file<br />
<br />
With this option, for every input pdb file, the program will selected automatically the flat or curved membrane boundaries, whichever had the lower calculated transfer energy.</blockquote>Then run the program:<br />
<br />
<code>~ak87/exa/PROGRAM/OPM-MEMBRANE/immers<1membrane.inp>rec-opm.out</code><br />
<br />
Use <code>datasub1</code> file to extract residue numbers for the membrane placement in Maestro.<br />
See ppm3_instructions.docx file in the program directory for more detail.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15378Global Matching Sphere Optimization2023-05-03T20:38:56Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
The scripts and example config file are in <code>/wynton/group/bks/soft/juggler</code><br />
<br />
=== Preparation ===<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
=== Running ===<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/group/bks/soft/juggler/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/group/bks/soft/juggler/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
The script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
<code>/wynton/group/bks/soft/juggler/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
<br />
Generations Set# NormLogAUC RMSD Nsph Combined_metrics<br />
<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15377Global Matching Sphere Optimization2023-05-02T00:02:35Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
The scripts and example config file are in <code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE</code><br />
<br />
=== Preparation ===<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
=== Running ===<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
The script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
<br />
Generations Set# NormLogAUC RMSD Nsph Combined_metrics<br />
<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15376Global Matching Sphere Optimization2023-05-02T00:01:45Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
The scripts and example config file are in <code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE</code><br />
<br />
=== Preparation ===<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
=== Running ===<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
The script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
Generations Set# NormLogAUC RMSD Nsph Combined_metrics<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15375Global Matching Sphere Optimization2023-05-02T00:01:25Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
The scripts and example config file are in <code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE</code><br />
<br />
=== Preparation ===<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
=== Running ===<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
The script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
'''If not converged'''<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
Generations Set# NormLogAUC RMSD Nsph Combined_metrics<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15374Global Matching Sphere Optimization2023-05-02T00:00:47Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
The scripts and example config file are in <code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE</code><br />
<br />
=== Preparation ===<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
=== Running ===<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
The script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]<br />
<br />
* If not converged*<br />
The program will stop after 200 generations if convergence is not reached. In case it takes too long you can stop it any time by pressing Ctrl-C. It doesn't mean that you have no results, though. Juggler generates <code>combined_metrics.dat</code> file in the working directory, which contains metrics for all sets explored. It contains the following columns:<br />
Generations Set# NormLogAUC RMSD Nsph Combined_metrics<br />
You can paste its content into Excel, sort by the highest NormLogAUC and pick a MS set of your liking.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15373Global Matching Sphere Optimization2023-05-01T23:01:45Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
The scripts and example config file are in <code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE</code><br />
<br />
=== Preparation ===<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
=== Running ===<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
The script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code><br />
[[File:Combined metrics plot from a GA run.png|thumb|Combined metrics plot from a GA run]]</div>Iamkaanthttp://wiki.docking.org/index.php?title=File:Combined_metrics_plot_from_a_GA_run.png&diff=15372File:Combined metrics plot from a GA run.png2023-05-01T23:01:24Z<p>Iamkaant: </p>
<hr />
<div>combined metrics plot from a GA run</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15371Global Matching Sphere Optimization2023-05-01T19:52:22Z<p>Iamkaant: </p>
<hr />
<div><br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the <br />
<br />
* enrichment (normalized logAUC, see [http://arxiv.org/abs/2210.10905 Ian's paper]), <br />
* RMSD of the docked pose to the experimental one. <br />
<br />
After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup & Running ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
The scripts and example config file are in <code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE</code><br />
<br />
=== Preparation ===<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
=== Running ===<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.<br />
<br />
The run will take few hours to ~2 days depending on the number of actives and decoys and the load of Wynton.<br />
<br />
=== Processing results ===<br />
The script will print the paths to where three best matching sphere sets are:<br />
<br />
* best enrichment<br />
* best RMSD<br />
* best balanced metrics (highest enrichment, lowest RMSD and lowest Nsph).<br />
<br />
You can use <code>dockfiles</code> from the listed directories.<br />
<br />
You can also track the optimization progress running the following script in your working directory:<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/plot_all_metrics.py</code><br />
<br />
It produces <code>combined_metrics.png</code></div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15370Global Matching Sphere Optimization2023-04-30T07:06:18Z<p>Iamkaant: </p>
<hr />
<div><br />
<br />
<br />
== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the enrichment, RMSD of the docked pose to the experimental one. After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
<br />
The program consists of two main modules:<br />
* a Python script (<code>juggler.py</code>) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
<br />
Prepare <code>dockfiles</code> directory with any tools of your liking (blastermaster, dockopt etc). You will also need <code>rec.pdb</code>, <code>rec.crg.pdb</code>, <code>xtal-lig.pdb</code>, <code>ligands.names</code>, <code>decoys.names</code> and a <code>sdi</code> directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as <code>xtal-lig.pdb</code>.<br />
<br />
Prepare <code>juggler_config.yml</code> file. Put it into an empty directory.<br />
<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
<br />
<code>source /wynton/group/bks/soft/python_envs/python3.8.5.sh</code><br />
<br />
<code>python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py</code><br />
<br />
You can detach from the screen (Ctrl-A d).<br />
<br />
Open a new screen. In the same directory launch a docking daemon<br />
<br />
<code>/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</code><br />
<br />
You can run other calculations on Wynton in the meantime, as Juggler will track the task IDs that it launched.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15369Global Matching Sphere Optimization2023-04-30T07:01:38Z<p>Iamkaant: </p>
<hr />
<div>== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the enrichment, RMSD of the docked pose to the experimental one. After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
The program consists of two main modules:<br />
* a Python script (juggler.py) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
Prepare dockfiles directory with any tools of your liking (blastermaster, dockopt etc). You will also need rec.pdb, rec.crg.pdb, xtal-lig.pdb, ligands.names, decoys.names and a "sdi" directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as xtal-lig.pdb.<br />
Prepare juggler_config.yml file. Put it into an empty directory.<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
source /wynton/group/bks/soft/python_envs/python3.8.5.sh<br />
python /wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/juggler.py<br />
You can detach from the screen (Ctrl-A d).<br />
Open a new screen. In the same directory launch a docking daemon<br />
/wynton/home/irwin/ak87/ak87/UCSF/NEOCORTEX/SCRIPTS/RELEASE/rundockd-wynton-taskid.sh</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15368Global Matching Sphere Optimization2023-04-30T06:59:10Z<p>Iamkaant: </p>
<hr />
<div>== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the enrichment, RMSD of the docked pose to the experimental one. After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
The program consists of two main modules:<br />
* a Python script (juggler.py) that performs MS generation, optimization, and ranking.<br />
* a Bash script, that watches created directory structure, runs docking and processes docking results<br />
== Setup ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
Prepare dockfiles directory with any tools of your liking (blastermaster, dockopt etc). You will also need rec.pdb, rec.crg.pdb, xtal-lig.pdb, ligands.names, decoys.names and a "sdi" directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as xtal-lig.pdb.<br />
Prepare juggler_config.yml file. Put it into an empty directory.<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
source /wynton/group/bks/soft/python_envs/python3.8.5.sh<br />
python ../../juggler-v0.7.py<br />
You can detach from the screen (Ctrl-A d).<br />
Open a new screen. In the same directory launch a docking daemon<br />
sh ../../rundockd-wynton-taskid.sh</div>Iamkaanthttp://wiki.docking.org/index.php?title=Global_Matching_Sphere_Optimization&diff=15367Global Matching Sphere Optimization2023-04-30T06:58:48Z<p>Iamkaant: Created page with "== Goal == To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres. == Description == The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets: * heavy atoms of xtal-lig * spheres prepared by SPHGEN program At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the enrichme..."</p>
<hr />
<div>== Goal ==<br />
To optimize your matching sphere (MS) setups getting more enrichment with fewer spheres.<br />
== Description ==<br />
The program performs optimization of matching spheres using genetic algorithm. It selects spheres from two sets:<br />
* heavy atoms of xtal-lig <br />
* spheres prepared by SPHGEN program<br />
At each generation, N matching sphere sets are created, containing a maximum of M spheres each. Then retrospective docking is done for each set, and sets are ranked by the enrichment, RMSD of the docked pose to the experimental one. After that, a quarter of sets "survive" and produce a new generation by direct transfer, mutations, and crossover. This process is repeated until enrichment, RMSD and minimum number of spheres do not change substantially in 10 generations.<br />
The program consists of two main modules:<br />
a Py<br />
* thon script (juggler.py) that performs MS generation, optimization, and ranking.<br />
* a Bas<br />
h script, that watches created directory structure, runs docking and processes docking results<br />
== Setup ==<br />
So far, the program is running on Wynton. LMK if you are interested in launching it on Gimel or other clusters.<br />
Prepare dockfiles directory with any tools of your liking (blastermaster, dockopt etc). You will also need rec.pdb, rec.crg.pdb, xtal-lig.pdb, ligands.names, decoys.names and a "sdi" directory with the paths to ligand .tgz files. To get RMSD of xtal-lig docked poses to the experimental pose, your xtal-lig.pdb must have correct bond orders and atom valences. You can edit it in Schrodinger and save as xtal-lig.pdb.<br />
Prepare juggler_config.yml file. Put it into an empty directory.<br />
Enter a screen environment so your run is not interrupted if you disconnect your SSH session. Then do:<br />
source /wynton/group/bks/soft/python_envs/python3.8.5.sh<br />
python ../../juggler-v0.7.py<br />
You can detach from the screen (Ctrl-A d).<br />
Open a new screen. In the same directory launch a docking daemon<br />
sh ../../rundockd-wynton-taskid.sh</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=15081Membrane Modeling2023-01-10T17:59:57Z<p>Iamkaant: </p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
'''''TO CHECK''': NPT ensemble.''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Export <code>$SCHRODINGER</code> variable:<br />
<br />
<code>SCHRODINGER=/nfs/soft2/schrodinger/2023-3/</code> (or a newer version)<br />
<br />
Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Positioning of the membrane ==<br />
If there is no precomputed membrane position from OPM or MemProtMD, you can model it using PPM 3.0 webserver (https://opm.phar.umich.edu/ppm_server3) or standalone software. The software is installed in <code>/nfs/home/ak87/exa/PROGRAM/OPM-MEMBRANE</code>. Copy <code>res.lib</code> file from the program directory to the directory with your .pdb file. Create input file like this:<syntaxhighlight lang="shell"><br />
1<br />
0 PMm out rec.pdb<br />
</syntaxhighlight><blockquote>0 or 1 -“do not use” or “use” heteroatoms in the input PDB file, respectively (solvent molecules are always excluded).<br />
<br />
MOM - type of membrane (see list of 3-letter codes for membranes below)<br />
<br />
“in” or “out” means topology of N-terminus of first subunit included in the corresponding input PDB file<br />
<br />
With this option, for every input pdb file, the program will selected automatically the flat or curved membrane boundaries, whichever had the lower calculated transfer energy.</blockquote>Then run the program:<br />
<br />
<code>~ak87/exa/PROGRAM/OPM-MEMBRANE/immers<1membrane.inp>rec-opm.out</code><br />
<br />
Use <code>datasub1</code> file to extract residue numbers for the membrane placement in Maestro.<br />
See ppm3_instructions.docx file in the program directory for more detail.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=15080Membrane Modeling2023-01-10T17:59:39Z<p>Iamkaant: </p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
'''''TO CHECK''': NPT ensemble.''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Export <code>$SCHRODINGER</code> variable:<br />
<br />
<code>SCHRODINGER=/nfs/soft2/schrodinger/2023-3/</code> (or a newer version)<br />
<br />
Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Positioning of the membrane ==<br />
If there is no precomputed membrane position from OPM or MemProtMD, you can model it using PPM 3.0 webserver (https://opm.phar.umich.edu/ppm_server3) or standalone software. The software is installed in <code>/nfs/home/ak87/exa/PROGRAM/OPM-MEMBRANE</code>. Copy <code>res.lib</code> file from the program directory to the directory with your .pdb file. Create input file like this:<syntaxhighlight lang="shell"><br />
1<br />
0 PMm out rec.pdb<br />
</syntaxhighlight><blockquote>0 or 1 -“do not use” or “use” heteroatoms in the input PDB file, respectively (solvent molecules are always excluded).<br />
<br />
MOM - type of membrane (see list of 3-letter codes for membranes below)<br />
<br />
“in” or “out” means topology of N-terminus of first subunit included in the corresponding input PDB file<br />
<br />
With this option, for every input pdb file, the program will selected automatically the flat or curved membrane boundaries, whichever had the lower calculated transfer energy.</blockquote>Then run the program:<br />
<br />
<code>~ak87/exa/PROGRAM/OPM-MEMBRANE/immers<1membrane.inp>rec-opm.out</code><br />
<br />
Use <code>datasub1</code> file to extract residue numbers for the membrane placement in Maestro.<br />
See ppm3_instructions.docx file in the program directory for more detail. Test</div>Iamkaanthttp://wiki.docking.org/index.php?title=User:Iamkaant&diff=15079User:Iamkaant2023-01-10T17:58:54Z<p>Iamkaant: </p>
<hr />
<div>My name is Andrii Kyrylchuk and I’m a PhD in organic chemistry. My research work began with quantum chemistry; then I’ve devoted almost 7 years to organic synthesis and then returned to computations. Now my work is mainly focused on quantum chemistry & molecular docking, but I’m also interested in the different topics of organic chemistry, general chemistry, NMR, biochemistry, IT, astronomy etc.</div>Iamkaanthttp://wiki.docking.org/index.php?title=User:Iamkaant&diff=15078User:Iamkaant2023-01-10T17:58:46Z<p>Iamkaant: </p>
<hr />
<div>My name is Andrii Kyrylchuk and I’m a PhD in organic chemistry. My research work began with quantum chemistry; then I’ve devoted almost 7 years to organic synthesis and then returned to computations. Now my work is mainly focused on quantum chemistry & molecular docking, but I’m also interested in the different topics of organic chemistry, general chemistry, NMR, biochemistry, IT, astronomy etc. Test</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14951Membrane Modeling2022-10-31T22:58:41Z<p>Iamkaant: added the note to export $SCHRODINGER variable before running the script</p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
'''''TO CHECK''': NPT ensemble.''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Export <code>$SCHRODINGER</code> variable:<br />
<br />
<code>SCHRODINGER=/nfs/soft2/schrodinger/2023-3/</code> (or a newer version)<br />
<br />
Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Positioning of the membrane ==<br />
If there is no precomputed membrane position from OPM or MemProtMD, you can model it using PPM 3.0 webserver (https://opm.phar.umich.edu/ppm_server3) or standalone software. The software is installed in <code>/nfs/home/ak87/exa/PROGRAM/OPM-MEMBRANE</code>. Copy <code>res.lib</code> file from the program directory to the directory with your .pdb file. Create input file like this:<syntaxhighlight lang="shell"><br />
1<br />
0 PMm out rec.pdb<br />
</syntaxhighlight><blockquote>0 or 1 -“do not use” or “use” heteroatoms in the input PDB file, respectively (solvent molecules are always excluded).<br />
<br />
MOM - type of membrane (see list of 3-letter codes for membranes below)<br />
<br />
“in” or “out” means topology of N-terminus of first subunit included in the corresponding input PDB file<br />
<br />
With this option, for every input pdb file, the program will selected automatically the flat or curved membrane boundaries, whichever had the lower calculated transfer energy.</blockquote>Then run the program:<br />
<br />
<code>~ak87/exa/PROGRAM/OPM-MEMBRANE/immers<1membrane.inp>rec-opm.out</code><br />
<br />
Use <code>datasub1</code> file to extract residue numbers for the membrane placement in Maestro.<br />
See ppm3_instructions.docx file in the program directory for more detail.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14902Synthesia2022-10-11T17:43:11Z<p>Iamkaant: /* How to create this tree? */</p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight><br />
==== How to create this tree? ====<br />
* If only one stage needed -- just write manually.<br />
* The authors used open-source ML tool AiZynthFinder: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00472-1, https://github.com/MolecularAI/aizynthfinder<br />
* Reaxys Retrosynthesis tool (was not able to find a root for granisetron though, seems to use only published procedures)<br />
* Sci-Finder Retrosynthesis. Exports results in .pdf only, but at least you can copy compound SMILES from the Retrosynthesis Plan, just click on the structure and select "Substance Detail".<br />
* IBM RXN https://rxn.res.ibm.com/. Based on machine-extracted patent reactions. You have to manually select reactions for each step.<br />
* Spaya AI https://spaya.ai. <br />
<br />
In any case, except for one stage synthesis, I would recommend consulting a synthetic chemist before creating analogs.<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected: <Integer> <String> <Integer> <Integer> The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14754Membrane Modeling2022-09-04T21:20:11Z<p>Iamkaant: </p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
'''''TO CHECK''': NPT ensemble.''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Positioning of the membrane ==<br />
If there is no precomputed membrane position from OPM or MemProtMD, you can model it using PPM 3.0 webserver (https://opm.phar.umich.edu/ppm_server3) or standalone software. The software is installed in <code>/nfs/home/ak87/exa/PROGRAM/OPM-MEMBRANE</code>. Copy <code>res.lib</code> file from the program directory to the directory with your .pdb file. Create input file like this:<syntaxhighlight lang="shell"><br />
1<br />
0 PMm out rec.pdb<br />
</syntaxhighlight><blockquote>0 or 1 -“do not use” or “use” heteroatoms in the input PDB file, respectively (solvent molecules are always excluded).<br />
<br />
MOM - type of membrane (see list of 3-letter codes for membranes below)<br />
<br />
“in” or “out” means topology of N-terminus of first subunit included in the corresponding input PDB file<br />
<br />
With this option, for every input pdb file, the program will selected automatically the flat or curved membrane boundaries, whichever had the lower calculated transfer energy.</blockquote>Then run the program:<br />
<br />
<code>~ak87/exa/PROGRAM/OPM-MEMBRANE/immers<1membrane.inp>rec-opm.out</code><br />
<br />
Use <code>datasub1</code> file to extract residue numbers for the membrane placement in Maestro.<br />
See ppm3_instructions.docx file in the program directory for more detail.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14753Membrane Modeling2022-09-04T21:19:22Z<p>Iamkaant: added the manual for membrane positioning</p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
'''''TO CHECK''': NPT ensemble.''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Positioning of the membrane ==<br />
If there is no precomputed membrane position from OPM or MemProtMD, you can model it using PPM 3.0 webserver (https://opm.phar.umich.edu/ppm_server3) or standalone software. The software is installed in <code>/nfs/home/ak87/exa/PROGRAM/OPM-MEMBRANE</code>. Copy <code>res.lib</code> file from the program directory to the directory with your .pdb file. Create input file like this:<syntaxhighlight lang="shell"><br />
1<br />
0 PMm out rec.pdb<br />
</syntaxhighlight><blockquote>0 or 1 -“do not use” or “use” heteroatoms in the input PDB file, respectively (solvent molecules are always excluded).<br />
<br />
MOM - type of membrane (see list of 3-letter codes for membranes below)<br />
<br />
“in” or “out” means topology of N-terminus of first subunit included in the corresponding input PDB file<br />
<br />
With this option, for every input pdb file, the program will selected automatically the flat or curved membrane boundaries, whichever had the lower calculated transfer energy.</blockquote>Then run the program:<br />
<br />
<code>~ak87/exa/PROGRAM/OPM-MEMBRANE/immers<1membrane.inp>rec-opm.out</code><br />
<br />
Use <code>datasub1</code> file to extract residue numbers for the membrane placement in Maestro</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14651Synthesia2022-08-11T18:57:24Z<p>Iamkaant: added retrosynthesis route generation by IBM RXN</p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight><br />
==== How to create this tree? ====<br />
* If only one stage needed -- just write manually.<br />
* The authors used open-source ML tool AiZynthFinder: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00472-1, https://github.com/MolecularAI/aizynthfinder<br />
* Reaxys Retrosynthesis tool (was not able to find a root for granisetron though, seems to use only published procedures)<br />
* Sci-Finder Retrosynthesis. Exports results in .pdf only, but at least you can copy compound SMILES from the Retrosynthesis Plan, just click on the structure and select "Substance Detail".<br />
* IBM RXN https://rxn.res.ibm.com/. Based on machine-extracted patent reactions. You have to manually select reactions for each step.<br />
<br />
In any case, except for one stage synthesis, I would recommend consulting a synthetic chemist before creating analogs.<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected: <Integer> <String> <Integer> <Integer> The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14545Synthesia2022-08-03T21:11:54Z<p>Iamkaant: </p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight><br />
==== How to create this tree? ====<br />
* If only one stage needed -- just write manually.<br />
* The authors used open-source ML tool AiZynthFinder: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00472-1, https://github.com/MolecularAI/aizynthfinder<br />
* Reaxys Retrosynthesis tool (was not able to find a root for granisetron though, seems to use only published procedures)<br />
* Sci-Finder Retrosynthesis. Exports results in .pdf only, but at least you can copy compound SMILES from the Retrosynthesis Plan, just click on the structure and select "Substance Detail".<br />
<br />
In any case, except for one stage synthesis, I would recommend consulting a synthetic chemist before creating analogs.<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected: <Integer> <String> <Integer> <Integer> The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14542Synthesia2022-08-03T21:10:09Z<p>Iamkaant: added retrosynthesis route generation tools</p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight><br />
==== How to create this tree? ====<br />
* If only one stage needed -- just write manually.<br />
* The authors used open-source ML tool AiZynthFinder: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-020-00472-1, https://github.com/MolecularAI/aizynthfinder<br />
* Reaxys Retrosynthesis tool (was not able to find a root for granisetron though, seems to use only published procedures)<br />
* Sci-Finder Retrosynthesis. Exports results in .pdf only, but at least you can copy compound SMILES from the Retrosynthesis Plan, just click on the structure and select "Substance Detail".<br />
<br />
In any case, except for one stage synthesis, I would recommend consulting a synthetic chemist before creating analogs.<br />
<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected: <Integer> <String> <Integer> <Integer> The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14541Synthesia2022-08-03T20:49:07Z<p>Iamkaant: </p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight>How to create this tree?<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected: <Integer> <String> <Integer> <Integer> The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14540Synthesia2022-08-03T20:48:39Z<p>Iamkaant: </p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight>How to create this tree?<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected:<br />
<br />
<Integer> <String> <Integer> <Integer><br />
<br />
The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14538Synthesia2022-08-03T20:46:45Z<p>Iamkaant: </p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight>How to create this tree?<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected:<br />
<br />
<Integer> <String> <Integer> <Integer><br />
<br />
The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Synthesia&diff=14537Synthesia2022-08-03T20:44:49Z<p>Iamkaant: Created page with "<blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without..."</p>
<hr />
<div><blockquote>Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.</blockquote>Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246<br />
<br />
Website: https://software.zbh.uni-hamburg.de/customers/tools <br />
<br />
== Installation ==<br />
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like <code>AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=</code>. Copy the license key from the file, download and unpack the program and run the command:<br />
<br />
<code>./synthesia --license <your_license_here></code><br />
<br />
My installation is in <code>/mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0</code><br />
<br />
== Running ==<br />
To run the program, you need:<br />
<br />
# a retrosynthetic tree and <br />
# a library of building blocks ("<code>SMILES Name</code>", no preprocessing needed).<br />
<br />
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):<br />
<br />
# Extended-Connectivity Fingerprints (ECFP)<br />
# Functional-Class Fingerprints (FCFP)<br />
# Connected Subgraph Fingerprints (CSFP)<br />
# Largest Ring<br />
# Largest Ringsystem<br />
# Molecular Weight<br />
# Number of Hydrogen-Bond Acceptors<br />
# Number of Anions<br />
# Number of Aromatic Atoms<br />
# Number of Aromatic Rings<br />
# Number of Aromatic Ringsystems<br />
# Number of Cations<br />
# Number of Hydrogen-Bond Donors<br />
# Number of Halogens<br />
# Number of Non-Hydrogen Atoms<br />
# Number of Hetero Atoms<br />
# Number of Hydrophobic Points<br />
# Number of Inorganic Atoms<br />
# Number of Lipinski Donors<br />
# Number of Nitrogens and Oxygens<br />
# Number of Non-Hydrogen Bonds<br />
# Number of Rings<br />
# Number of Ringsystems<br />
# Number of Rotatable Bonds<br />
# LogP-Value<br />
# Total Charge<br />
# Topological Polar Surface Area (TPSA)<br />
# Volume<br />
# Matching SMARTSS3 pattern. Either inclusion or exclusion<br />
<br />
=== Retrosynthetic tree ===<br />
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for [https://zinc15.docking.org/substances/ZINC000000000347/ granisetron] synthesis via amidation is shown below.<syntaxhighlight lang="json"><br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_chemical": true,<br />
"children":<br />
[<br />
{<br />
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",<br />
"is_reaction": true,<br />
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",<br />
"children":<br />
[<br />
{<br />
"smiles": "Cn1nc(C(O)=O)c2ccccc12",<br />
"is_chemical": true,<br />
"children": []<br />
},<br />
{<br />
"smiles": "CN1C2CCCC1CC(N)C2",<br />
"is_chemical": true,<br />
"children": []<br />
}<br />
]<br />
}<br />
]<br />
}<br />
</syntaxhighlight>How to create this tree?<br />
<br />
=== Configuration file ===<br />
<blockquote>All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.</blockquote>So far I've been using only command line parameters.<br />
<br />
=== Running ===<br />
<code>./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0</code><br />
<br />
<code>--inputStructures</code> -- a library of BBs<br />
<br />
<code>--retroSynTree</code> -- self-explanatory<br />
<br />
<code>--output</code> -- output .json file<br />
<br />
<code>--threads</code> -- Number of threads used for parallelization.<br />
<br />
<code>--allLeaves</code> -- '''very important''': without it you will only get suitable BBs and not final structures. The README says: ''Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified''.<br />
<br />
<code>--useECFP</code> -- filter analogs by ECFP. 4 parameter values are expected:<br />
<br />
<Integer> <String> <Integer> <Integer><br />
<br />
The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.<br />
<br />
=== Conclusion ===<br />
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).<br />
<br />
Cons — closed sources, steep learning curve, need to create retrosynthetic trees for each compound.<br />
<br />
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14296Membrane Modeling2022-06-14T02:51:31Z<p>Iamkaant: added troubleshooting of MD runs</p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
'''''NOTE''': In my experience, restrained MD runs with NgPT in Schrodinger often fail with the following message:''<br />
<br />
<code>''Allowed momentum exceeded on 17 particles.''</code><br />
<br />
''This seems to be related to the interference of restraints and the ensemble, as no such error is observed if no restraints are imposed, or if NVT ensemble is used. Still, NgPT ensemble is important for correct membrane sampling, because simulations at NVT long enough to permit membrane relaxation, lead to the smearing of the lipid bilayer and the formation of empty space.'' <br />
<br />
''If the mentioned problem occurs, import the last coordinates of the run (*-out.cms) into Maestro, open Minimization menu, load the structure from the workspace, and apply restraints on lipid and protein (force restraint of 10 is usually enough). Then run a minimization for 100 ps, and try to run NgPT MD starting from the optimized structure.''<br />
<br />
'''''TO CHECK''': NPT ensemble.''<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight></div>Iamkaanthttp://wiki.docking.org/index.php?title=Conversion_of_.rxn_files_to_reaction_SMARTS&diff=14243Conversion of .rxn files to reaction SMARTS2022-06-06T21:14:39Z<p>Iamkaant: Created page with "DataWarrior folks have implemented enumeration protocol in their soft. It is published in 10.1021/acs.jcim.1c01041 They store reactions in .rxn format, which is documented in..."</p>
<hr />
<div>DataWarrior folks have implemented enumeration protocol in their soft. It is published in 10.1021/acs.jcim.1c01041<br />
<br />
They store reactions in .rxn format, which is documented in https://docs.chemaxon.com/display/docs/mdl-molfiles-rgfiles-sdfiles-rxnfiles-rdfiles-formats.md. The files can be found in "reactions" directory in https://github.com/joewah/Virtual-Fragment-Spaces<br />
<br />
The .rxn files can be opened in ChemAxon MarvinSketch. Then just select the reaction, right click, Copy as -- Daylight SMARTS. And you get the correct SMARTS with all atom lists. For example, for amidation reaction I got<br />
<br />
[#7,#8]-[#7H1:1](-[#6:2]=[#7,#8,#16])-[#6,#16]=[#7,#8,#16].[#6:3]-[#6:4](-[#8,#17;D1])=[O:5]>>[#6:2]-[#7:1]-[#6:4](-[#6:3])=[O:5]</div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14174Membrane Modeling2022-04-26T06:53:58Z<p>Iamkaant: changed Pymol script: Hs are not removed from lipids</p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
# these atom numbers do not exist in POPC or DPPC<br />
# therefore, we do not remove protons from the lipid<br />
# structure to make more spheres. Uncomment these<br />
# lines and change to proper H numbers if needed.<br />
#remove /MEM////HS<br />
#remove /MEM////HX<br />
#remove /MEM////HY<br />
#remove /MEM////H*B<br />
#remove /MEM////H*A<br />
#remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight></div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14168Membrane Modeling2022-04-21T01:42:06Z<p>Iamkaant: added protocol for MemProtMD</p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight><br />
<br />
== Lipid membrane models from MemProtMD ==<br />
If a protein-membrane complex was already modeled for your system and deposited at [http://memprotmd.bioch.ox.ac.uk/home/ MemProtMD] website, you can use it and skip doing MD in Schrodinger. The steps are very similar to the ones after Schrodinger run.<br />
<br />
Download the <code>*_default_dppc.mpmd.finalframe.atomistic.pdb</code> file from the bottom of the page. Rename it to <code>last-mol.pdb</code>. <br />
<br />
Use prepare.pml script (below). You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
The script below differs from the one for processing Schrodinger results in two points: solvent residue is <code>SOL</code> instead of <code>SPC</code>, and lipids are called <code>DPPC</code> instead of <code>POPC</code>.<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SOL<br />
<br />
create MEM, ////DPPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight></div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14167Membrane Modeling2022-04-21T01:27:35Z<p>Iamkaant: Added protocol for MD in Schrodinger</p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
= Membrane modelling in Schrodinger =<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
== MD of protein and membrane ==<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website https://opm.phar.umich.edu/proteins/ and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open <code>-out.cms</code>, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a <code>.pdb</code> file.<br />
<br />
== Preparation of the structure for Blastermaster ==<br />
Use <code>prepare.pml</code> script. You need to provide the <code>rec.pdb</code> to want to use for docking (potentially with missing loops) at that step as <code>xtal-prot.pdb</code>. Rename MD system as <code>last-mol.pdb</code>.<br />
<br />
<code>pymol -qc last-mol.pdb xtal-prot.pdb prepare.pml</code><br />
<br />
<syntaxhighlight lang="python"><br />
align last-mol, xtal-prot<br />
<br />
remove ////SPC<br />
<br />
create MEM, ////POPC<br />
<br />
remove /MEM////HS<br />
remove /MEM////HX<br />
remove /MEM////HY<br />
remove /MEM////H*B<br />
remove /MEM////H*A<br />
remove /MEM////H*C<br />
<br />
remove /MEM////C1<br />
remove /MEM////C2<br />
remove /MEM////C3<br />
remove /MEM////C11<br />
remove /MEM////C12<br />
remove /MEM////C13<br />
remove /MEM////C14<br />
remove /MEM////C15<br />
<br />
create MEM_C, /MEM////*C* | /MEM////H*R | /MEM////H*S | /MEM////H*T | /MEM////H*X | /MEM////H*Y | /MEM////H*Z<br />
<br />
create shell, MEM_C within 20 of xtal-prot<br />
<br />
set retain_order, 1<br />
<br />
save shell.pdb, shell<br />
<br />
alter shell, resn='LPD'<br />
alter shell, name='C'<br />
alter shell, chain=''<br />
<br />
rebuild<br />
<br />
save shell-LPD.pdb, shell<br />
</syntaxhighlight><br />
<br />
== Preparation of grids with thinspheres ==<br />
Prepare grids with thinspheres for the protein without lipid as described in https://wiki.docking.org/index.php/How_to_do_parameter_scanning<br />
<br />
Create an empty directory and put <code>shell-LPD.pdb</code> there. Then run the following script:<br />
<br />
<code>sh blast-membrane-thinsph-scan.sh {path to the collection of "es_ld_thin_sph_rad_X.X" directories} {path to the dir with original working and dockfiles directories}</code><br />
<br />
This script runs qnifft and solvmap for each <code>es_ld_thin_sph_rad_X.X</code> directory, and then uses the second script of parameter scanning protocol to combine files into <code>dockfiles</code> directories.<br />
<br />
'''Important! LOOK AT YOUR GRIDS! Desolvation -- larger solvation where water is. Electrostatics -- no electrostatics in the lipid region. vdW -- no vdW in the lipid region.'''<br />
<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK with thinsphere scan<br />
# PREREQ -- run first step of https://wiki.docking.org/index.php/How_to_do_parameter_scanning (new_0001_generate_ES_LD_generation.py )<br />
# first argument -- path to the directory where dirs "es_ld_thin_sph_rad_X.X" are stored. <br />
# second argument -- path to the dir with original working and dockfiles<br />
# run in a new directory with shell-LPD.pdb<br />
<br />
run_once () {<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
cp $curr_dir/shell-LPD.pdb .<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
#cp -r $blastermaster_Prot/dockfiles .<br />
#cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> receptor.crg.lowdielectric.pdb<br />
need_files="amb.crg.oxt<br />
qnifft.parm<br />
vdw.siz"<br />
for file in $need_files<br />
do<br />
if [ -e $blastermaster_Prot/working/$file ]<br />
then<br />
cp $blastermaster_Prot/working/$file .<br />
else<br />
cp $blast_orig/working/$file .<br />
fi<br />
done<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
#cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
#echo "Check if the grid size changed, compare this with INDOCK"<br />
#python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
#head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids if they are present in the folder<br />
if [ -e $blastermaster_Prot/working/heavy ]<br />
then<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydrogen || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
#cp ligand.desolv.heavy ../dockfiles/.<br />
fi<br />
echo $dir " DONE!"<br />
}<br />
<br />
curr_dir=$(pwd)<br />
workdirs=$1<br />
blast_orig=$2<br />
dirs=$(ls -d $workdirs/es_ld_thin_sph_rad_*)<br />
for dir in $dirs<br />
do<br />
blastermaster_Prot=$dir<br />
local_dir=$(echo $dir | awk -F"\/" '{print $NF}')<br />
mkdir -p $local_dir/working<br />
cd $local_dir/working || exit<br />
run_once<br />
cd $curr_dir || exit<br />
done<br />
<br />
<br />
python ~rstein/zzz.scripts/DOCK_prep_scripts/new_0002_combine_es_ld_grids_into_combos.py -p $blast_orig<br />
<br />
</syntaxhighlight><br />
<br />
== Running blastermaster with default parameters ==<br />
'''Warning! Do not use these grids, as the default grids with lipid spheres give incorrect solvation energies. Use the ones with thinspheres instead!'''<br />
<syntaxhighlight lang="bash"><br />
#!/bin/bash<br />
# a script to prepare a protein with the lipid membrane for DOCK <br />
# first argument -- path to blastermaster files of the protein without membrane<br />
# run in a new directory with shell-LPD.pdb<br />
blastermaster_Prot=$1<br />
# mkdir add_membrane<br />
# Run qnifft for electrostatic grids<br />
# cp $blastfiles_membrane/shell-LPD.pdb<br />
# delete all fields except HETATM from shell-LPD.pdb<br />
sed -ir '/^CRYST1/d' shell-LPD.pdb<br />
sed -ir '/^CONECT/d' shell-LPD.pdb<br />
sed -ir '/^END/d' shell-LPD.pdb<br />
cp $blastermaster_Prot/working/rec.crg.pdb .<br />
#cp $blastermaster_Prot/working/lowdielectric.sph.pdb # (if wanted, if you run a parameter scan, be sure to use the correct spheres!)<br />
cp -r $blastermaster_Prot/dockfiles .<br />
cp -r $blastermaster_Prot/INDOCK .<br />
mv rec.crg.pdb rec.crg.solo.pdb<br />
cat rec.crg.solo.pdb shell-LPD.pdb > receptor.crg.lowdielectric.pdb<br />
# extracting SPH lines from receptor<br />
number=$(grep -n SPH $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb | head -n 1 | awk -F ":" '{print $1}')<br />
tail --lines=+$number $blastermaster_Prot/working/receptor.crg.lowdielectric.pdb >> rec.crg.lowdielectric.pdb<br />
cp $blastermaster_Prot/working/amb.crg.oxt .<br />
echo "C lpd 0.000 LIPID SPHERE" >> amb.crg.oxt<br />
cp $blastermaster_Prot/working/box .<br />
cp $blastermaster_Prot/working/qnifft.parm .<br />
cp $blastermaster_Prot/working/vdw.siz .<br />
$DOCKBASE/proteins/qnifft/bin/qnifft22_193_pgf_32 qnifft.parm >& qnifft.log<br />
$DOCKBASE/proteins/blastermaster/phiTrim.py qnifft.electrostatics.phi box trim.electrostatics.phi >& trim.log<br />
<br />
cp trim.electrostatics.phi dockfiles/.<br />
<br />
# Check if the grid size changed:<br />
echo "Check if the grid size changed, compare this with INDOCK"<br />
python ~jklyu/zzz.script/pymol_movie/phi_to_dx.py trim.electrostatics.phi trim.electrostatics.phi.dx > gridsize<br />
head -n 1 gridsize<br />
# compare with delphi_nsize in INDOCK<br />
<br />
# Run solvmap for Ligand Desolvation Grids<br />
mkdir heavy<br />
cd heavy || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/heavy/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
cd ../ || exit<br />
<br />
mkdir hydrogen<br />
cd hydroger || exit<br />
cp ../receptor.crg.lowdielectric.pdb rec.crg.lds.pdb # (with thin spheres [if needed] and membrane)<br />
cp ../box .<br />
cp $blastermaster_Prot/working/hydrogen/INSEV .<br />
$DOCKBASE/proteins/solvmap/bin/solvmap >& solvmap.log<br />
cp ligand.desolv.heavy ../dockfiles/.<br />
echo "DONE!"<br />
</syntaxhighlight></div>Iamkaanthttp://wiki.docking.org/index.php?title=Membrane_Modeling&diff=14166Membrane Modeling2022-04-21T01:04:12Z<p>Iamkaant: </p>
<hr />
<div>Written by Stefan Gahbauer, 2019/11/03<br />
<br />
In order to account for ligand desolvation and electrostatic interactions in the low-dielectric environment of the hydrophobic membrane core, a lipid-bilayer is generated around the target receptor and included in the docking score grid generation.<br />
Aiming at a fast, robust and computationally effective equilibration of the lipid bilayer around the embedded transmembrane receptor, coarse-grained (CG) molecular dynamics (MD) simulations and (if needed) subsequent atomistic simulations are employed.<br />
<br />
<br />
= Required software and datasets =<br />
<br />
'''Gromacs''' (v5 or newer) - Molecular Dynamics software package (http://manual.gromacs.org/)<br />
<br />
'''CHARMM36m force field''' (http://mackerell.umaryland.edu/charmm_ff.shtml)<br />
<br />
'''MARTINI''' Coarse-grained force field parameters(http://cgmartini.nl/)<br />
<br />
'''DSSP''' - Secondary Structure assignment (https://swift.cmbi.umcn.nl/gv/dssp/ , https://anaconda.org/salilab/dssp)<br />
<br />
'''martinize.py''' - Coarse-graining atomistic protein structures (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''insane.py''' - INSerting proteins in coarse-grained MembrANE (http://cgmartini.nl/index.php/tools2/proteins-and-bilayers)<br />
<br />
'''initram.sh''' and '''backward.py''' - Conversion of coarse-grained system to atomistic resolution (http://cgmartini.nl/index.php/tools2/resolution-transformation)<br />
<br />
<br />
= 1) Setting up the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0001-prepare-protein-CG-membrane.sh<br />
<br />
== 1.1) Prepare your files ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
Copy your rec.pdb to your working directory.<br />
<br />
If your rec.pdb has gaps, e.g. unresolved loops between transmembrane helices in case of GPCRs, try to model missing residues.<br />
<br />
One way is to use MODELLER following https://salilab.org/modeller/wiki/Missing%20residues.<br />
<br />
Corresponding input scripts for modeller can be found in:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/modeller<br />
<br />
== 1.2) Run the script ==<br />
<br />
Login to gimel2.<br />
<br />
./0001-prepare-protein-CG-membrane.sh<br />
<br />
The script reads rec.pdb and copies all other required files from <br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/gromacs<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Generate CHARMM36m force field parameters of your protein in a Gromacs-readable format. <br />
<br />
Used tool: gmx pdb2gmx <br />
<br />
Output files are stored in the generated ''pdb2gmx'' directory<br />
<br />
-conf.gro / conf.pdb - Gromacs coordinate file<br />
<br />
-topol.top / Protein-atomistic.itp - Gromacs topology file, i.e. force field description of your input structure<br />
<br />
-posre.itp - Position restraints for heavy atoms of atomistic protein strucutre.<br />
<br />
<br />
'''b.''' Build coarse-grained structure<br />
<br />
Used tool: martinize.py<br />
<br />
Output files are stored in the generated ''martini'' directory.<br />
<br />
-chain_.ssd - Output from the DSSP program that is called by martinize.py<br />
<br />
-prot-cg.pdb - Coarse-grained protein structure<br />
<br />
-prot-cg.top - Coarse-grained Martini topology of system<br />
<br />
-Protein.itp - Coarse-grained Martini description of Protein structure<br />
<br />
-prot-rot.pdb - Coarse-grained protein structure aligned along z-axis of the simulation box according to the proteins first principal component axis. This ensures the correct placement of the protein during membrane preparation. '''You may have to adjust the orientation of your input structure prior to membrane modeling.'''<br />
<br />
<br />
'''c.''' Build coarse-grained membrane<br />
<br />
Used tool: insane.py<br />
<br />
Here, a lipid bilayer will be created around the protein structure (in the x/y-plane) and water will be added to the system. The default box shape is rectangular and the size is set to x,y=10nm, z=11nm. This can be changed in the ./insane.py command line. The default lipid type is POPC, you can change that to arbitrary lipid compositions using the -l and -u options of of insane.py.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-out.top / topol-cg.top - Topology of coarse-grained system<br />
<br />
-cg-membrane.gro/.pdb - Coarse-grained system coordinates. '''Carefully inspect and visualize the cg-membrane.pdb.'''<br />
<br />
Use PyMOL to check if you're protein is embedded correctly in the lipid bilayer. <br />
<br />
<br />
= 2) Simulating the coarse-grained system =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0002-run-CG-Minimization-and-MD.sh<br />
<br />
== 2.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0002-run-CG-Minimization-and-MD.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Minimize coarse-grained system<br />
<br />
Used tools: gmx grompp , gmx mdrun<br />
<br />
gmx grompp generates a single .tpr file that contains all information necessary for running a MD simulation or minimization using gmx mdrun.<br />
<br />
Minimization parameters are provided in martini_new-rf_min.mdp. The system will be minimized in 500 steps using steepest descent. The protein structure will be frozen during minimization.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-min.tpr - MD run input file<br />
<br />
-min.log - Output log file from minimization<br />
<br />
-min.trr - Minimization trajectory<br />
<br />
-min.gro - Minimized system coordinates<br />
<br />
<br />
'''b.''' Simulate coarse-grained system<br />
<br />
MD simulation parameters are provided in martini_v2.x_new-rf.mdp. Strong position restraints are applied on the protein structure during the simulation. The system will be simulated for 50ns.<br />
<br />
Output files are stored in the ''martini'' directory.<br />
<br />
-md.tpr - MD run input file<br />
<br />
-md.log - Output log file from simulation<br />
<br />
-md.trr - lossless trajectory of simulation<br />
<br />
-md.xtc - coordinates of simulation trajectory<br />
<br />
-md.gro - coordinates of final simulation snapshot<br />
<br />
'''The simulation will run for roughly 3 hours.'''<br />
<br />
<br />
= 3) Converting coarse-grained system to atomistic resolution and select lipid atoms for grid generation =<br />
<br />
This is automated in the script:<br />
<br />
/mnt/nfs/home/stefan/zzz.scripts/INSERT-MEMBRANE/FILES/0003-backmap-and-lpd-selection.sh<br />
<br />
== 3.1) Run the script ==<br />
<br />
Copy the script above to your working directory.<br />
<br />
./0003-backmap-and-lpd-selection.sh<br />
<br />
=== Workflow ===<br />
<br />
'''a.''' Backmapping from coarse-grained to atomistic<br />
<br />
Used tool: gmx trjconv, initram.sh<br />
<br />
gmx trjconv can perform a variety of conversions of MD trajectory, e.g. making molecules broken over the periodic boundary conditions whole again. <br />
<br />
initram.sh calls the backward.py program which performs the backmapping of input coarse-grained to atomistic systems, and performs a small series of short minimizations and simulations to relax the backmapped system.<br />
<br />
Output files are stored in the generated ''backmap'' directory.<br />
<br />
-0-backmapped.gro / projected.gro - initial backmapped coordinates<br />
<br />
-backmapped.top - Topology of atomistic system<br />
<br />
-1-EM*/2-EM* - Output from minimizations <br />
<br />
-3-mdpr*/4-mdpr*/5-mdpr*/6-mdpr* - Output from simulations<br />
<br />
-backmapped.gro - Coordinates of final backmapped and relaxed system<br />
<br />
'''This may need a few attempts to work all the way through. There is a while-loop that only stops until all relaxation steps have finished.'''<br />
<br />
<br />
'''b.''' Replacing backmapped protein with initial atomistic protein structure<br />
<br />
Used tool: PyMOL script align.pml<br />
<br />
The pymol script will align the initial "Gromacs"-protein structure (conf.pdb) onto the backmapped structure and combine the fitted protein coordinates with the coordinates of the lipid and solvent environment.<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system'' directory.<br />
<br />
-conf-fitted.pdb - Fitted initial protein structure<br />
<br />
-backmapped-environment.pdb - All membrane and water coordinates<br />
<br />
-fitted_system.pdb - Complete system containing fitted protein and environment coordinates<br />
<br />
'''Be sure that fitted_system.pdb has the same number of coordinates as backmapped-mol.pdb.''' If there is a discrepancy there might be an issue with the PyMOL version you're using to run align.pml. Using PyMOL v2 or newer seems to avoid any issues. You can also generate backmapped-environment.pdb manually by taking all POPC and Water cooridnates from backmapped-mol.pdb.<br />
<br />
'''c.''' Run minimizations of atomistic system<br />
<br />
Used tools: gmx grompp, gmx mdrun<br />
<br />
Output files are stored in the ''backmap/prepare_AA_system'' directory.<br />
<br />
Tow minimizations will be calculated. <br />
<br />
1) Minimization with frozen protein coordinates: 1,500 steps steepest descent (min_freeze.mdp).<br />
<br />
-min_freeze* - Output files of first minimization<br />
<br />
2) Minimization of full system: 500 steps (min.mdp).<br />
<br />
-min* - Output files of second minimization<br />
<br />
<br />
'''d.''' Select lipid atoms for DOCK grid generation<br />
<br />
Used tools: PyMOL script prepare.pml<br />
<br />
This will select carbon and hydrogen atoms of the hydrophobic lipid tail segments in a radius of 1.7 nm around the protein and assign them to the atom type "LPD"<br />
<br />
You need to provide the rec.pdb to want to use for docking (potentially with missing loops) at that step as xtal-prot.pdb<br />
<br />
Output files are stored in the generated ''backmap/prepare_AA_system/prepare_min'' directory.<br />
<br />
-shell-LPD.pdb - all LPD atoms selected for grid generation<br />
<br />
Add these coordinates to your docking protein structure and provide a amb.crg.oxt file adding<br />
<br />
C lpd 0.000 LIPID SPHERE<br />
<br />
Now you can run blastermaster.<br />
<br />
=== Membrane modelling in Schrodinger ===<br />
Written by Andrii Kyrylchuk, 2022/04/20<br />
<br />
==== MD of protein and membrane ====<br />
Import structure '''without''' ligand, use <u>Preparation wizard</u> as described in "Code for Controls..." to model missing loops and capping.<br />
<br />
Then open <u>System Builder</u>, click <code>Setup membrane</code>.<br />
<br />
Go to the website <nowiki>https://opm.phar.umich.edu/proteins/</nowiki> and find your protein. Copy residue numbers from the bottom of the page to the field <code>Transmembrane atoms...</code>. The format is as follows:<br />
<br />
<code>res.num 76-97,112-136,141,...</code><br />
<br />
Click <code>Place Automatically</code>, <code>OK</code>. Then click <code>Run</code>. Examine lipids and solvent after run completes.<br />
<br />
Then use <u>Molecular Dynamics</u> menu to set up the calculation. Select prepared system, click <code>Load</code> on top of the menu. Put simulation time of 5 ns, <code>Advanced Options</code> -- <code>Restraints</code> -- <code>Add.</code> Select protein, and put <code>Force Constant</code> of 100, click <code>Apply</code> and <code>OK.</code> Then click down arrow left of the <code>Run</code> button in the parent window and click <code>Write</code>.<br />
<br />
Copy the project folder (desmond_md_job_X) to gimel, login to <u>gimel5</u>, edit desmond_md_job_X.sh: delete <code>-lic DESMOND_GPGPU:16</code> and insert <code>-HOST gimel5.gpu</code> (or <code>gimel5.heavygpu</code>). Run .sh file, and your task will be submitted to a queue. For my system it took 1.5 hr to complete.<br />
<br />
Download the project folder to your PC, open Maestro, click <code>Import structure</code> and open -out.cms, click on <code>T</code> icon at the new entry in project table and click <code>Display Trajectory Snapshots</code>. Select the last one, click <code>Display</code> and check if the protein did not change position during MD run, then click <code>Export</code>, to Project Table, Frames Selected only. You will get a new entry in the Project Table. Export it to a .pdb file.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=14088Andrii's notes on SynthI2022-03-28T19:40:26Z<p>Iamkaant: added Synthons generated from CSSB00020671770</p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
= Preparing analogs with SynthI =<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
== A. Enumeration based on all found BBs ==<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.<br />
<br />
Different from analog generation, the output of enumeration '''does not''' contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
== B. Analogs from a synthon library ==<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.<br />
<br />
== Enumeration with the updated SynthI (work in progress) ==<br />
Current version of SynthI is at /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC. The main changes from the original version are <br />
<br />
* support of two synthon files as input, one per each reagent<br />
* output of the synthon IDs and reaction ID, that led to a specific compound.<br />
<syntaxhighlight lang="shell"><br />
source /nfs/soft2/anaconda3/bin/activate SynthI-env<br />
</syntaxhighlight>Prepare synthons for a specific reaction. List of supported reactions can be found in the .pdf file in the parent dir. First, make .smi file with SMILES and id of each building block (e.g. amine and carboxylic acid). Then prepare synthons from each of the BB files:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BBsBulkClassificationAndSynthonization.py -i bb-list.smi<br />
</syntaxhighlight>Leave only SMILES and names<syntaxhighlight lang="shell"><br />
awk '{print $1 " " $NF}' bb_list.smi_Synthmode.smi > synth.smi<br />
</syntaxhighlight>Then you can enumerate the library based on two reagent (synthon) lists:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i synthon_X.smi -i2 synthon_Y.smi -oD RESULTS/ --enumerationMode --nCores 2<br />
</syntaxhighlight>You will get results in RESULTS/FinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi:<br />
C=C(C)C(O)(C(=O)n1cc(C(C)=O)cc1C(=O)NCc1ccccc1)C(F)(F)F R5.3_CSSB00155635782_CSSB00000019219_<br />
Where R5.3 is the reaction ID (see .pdf), followed by two synthon IDs. The script is also capable of invoking only specified reactions, by using flags <code>--fragmentationMode include_only --reactionsToWorkWith "R3, R5"</code> (not tested).<br />
<br />
<br />
=TODO=<br />
<br />
# Work with 2 synthon files, instead of one. -- DONE<br />
# Output identifiers of enumerated compds: RXN_synt1_synt2 -- DONE<br />
# Processing the end of synthon file<br />
# Test sets for each reaction<br />
# Checking multistage reactions<br />
# Merging of temp files once in a while -- DONE<br />
<br />
=Synthons generated from CSSB00020671770=<br />
[[Image:CSSB00020671770.png|600px]]</div>Iamkaanthttp://wiki.docking.org/index.php?title=File:CSSB00020671770.png&diff=14087File:CSSB00020671770.png2022-03-28T19:35:21Z<p>Iamkaant: All synthons generated by SynthI from CSSB00020671770</p>
<hr />
<div>== Summary ==<br />
All synthons generated by SynthI from CSSB00020671770</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=14086Andrii's notes on SynthI2022-03-28T19:33:32Z<p>Iamkaant: </p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
= Preparing analogs with SynthI =<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
== A. Enumeration based on all found BBs ==<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.<br />
<br />
Different from analog generation, the output of enumeration '''does not''' contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
== B. Analogs from a synthon library ==<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.<br />
<br />
== Enumeration with the updated SynthI (work in progress) ==<br />
Current version of SynthI is at /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC. The main changes from the original version are <br />
<br />
* support of two synthon files as input, one per each reagent<br />
* output of the synthon IDs and reaction ID, that led to a specific compound.<br />
<syntaxhighlight lang="shell"><br />
source /nfs/soft2/anaconda3/bin/activate SynthI-env<br />
</syntaxhighlight>Prepare synthons for a specific reaction. List of supported reactions can be found in the .pdf file in the parent dir. First, make .smi file with SMILES and id of each building block (e.g. amine and carboxylic acid). Then prepare synthons from each of the BB files:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BBsBulkClassificationAndSynthonization.py -i bb-list.smi<br />
</syntaxhighlight>Leave only SMILES and names<syntaxhighlight lang="shell"><br />
awk '{print $1 " " $NF}' bb_list.smi_Synthmode.smi > synth.smi<br />
</syntaxhighlight>Then you can enumerate the library based on two reagent (synthon) lists:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i synthon_X.smi -i2 synthon_Y.smi -oD RESULTS/ --enumerationMode --nCores 2<br />
</syntaxhighlight>You will get results in RESULTS/FinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi:<br />
C=C(C)C(O)(C(=O)n1cc(C(C)=O)cc1C(=O)NCc1ccccc1)C(F)(F)F R5.3_CSSB00155635782_CSSB00000019219_<br />
Where R5.3 is the reaction ID (see .pdf), followed by two synthon IDs. The script is also capable of invoking only specified reactions, by using flags <code>--fragmentationMode include_only --reactionsToWorkWith "R3, R5"</code> (not tested).<br />
<br />
<br />
=TODO=<br />
<br />
# Work with 2 synthon files, instead of one. -- DONE<br />
# Output identifiers of enumerated compds: RXN_synt1_synt2 -- DONE<br />
# Processing the end of synthon file<br />
# Test sets for each reaction<br />
# Checking multistage reactions<br />
# Merging of temp files once in a while -- DONE</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=14083Andrii's notes on SynthI2022-03-24T05:00:53Z<p>Iamkaant: </p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
= Preparing analogs with SynthI =<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
== A. Enumeration based on all found BBs ==<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.<br />
<br />
Different from analog generation, the output of enumeration '''does not''' contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
== B. Analogs from a synthon library ==<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.<br />
<br />
== Enumeration with the updated SynthI (work in progress) ==<br />
Current version of SynthI is at /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC. The main changes from the original version are <br />
<br />
* support of two synthon files as input, one per each reagent<br />
* output of the synthon IDs and reaction ID, that led to a specific compound.<br />
<syntaxhighlight lang="shell"><br />
source /nfs/soft2/anaconda3/bin/activate SynthI-env<br />
</syntaxhighlight>Prepare synthons for a specific reaction. List of supported reactions can be found in the .pdf file in the parent dir. First, make .smi file with SMILES and id of each building block (e.g. amine and carboxylic acid). Then prepare synthons from each of the BB files:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BBsBulkClassificationAndSynthonization.py -i bb-list.smi<br />
</syntaxhighlight>Leave only SMILES and names<syntaxhighlight lang="shell"><br />
awk '{print $1 " " $NF}' bb_list.smi_Synthmode.smi > synth.smi<br />
</syntaxhighlight>Then you can enumerate the library based on two reagent (synthon) lists:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i synthon_X.smi -i2 synthon_Y.smi -oD RESULTS/ --enumerationMode --nCores 2<br />
</syntaxhighlight>You will get results in RESULTS/FinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi:<br />
C=C(C)C(O)(C(=O)n1cc(C(C)=O)cc1C(=O)NCc1ccccc1)C(F)(F)F R5.3_CSSB00155635782_CSSB00000019219_<br />
Where R5.3 is the reaction ID (see .pdf), followed by two synthon IDs. The script is also capable of invoking only specified reactions, by using flags <code>--fragmentationMode include_only --reactionsToWorkWith "R3, R5"</code> (not tested).<br />
<br />
<br />
=TODO=<br />
<br />
# Work with 2 synthon files, instead of one. -- DONE<br />
# Output identifiers of enumerated compds: RXN_synt1_synt2 -- DONE<br />
# Processing the end of synthon file<br />
# Test sets for each reaction<br />
# Checking multistage reactions<br />
# Merging of temp files once in a while</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=14076Andrii's notes on SynthI2022-03-22T02:50:16Z<p>Iamkaant: </p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
= Preparing analogs with SynthI =<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
== A. Enumeration based on all found BBs ==<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.<br />
<br />
Different from analog generation, the output of enumeration '''does not''' contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
== B. Analogs from a synthon library ==<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.<br />
<br />
== Enumeration with the updated SynthI (work in progress) ==<br />
Current version of SynthI is at /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC. The main changes from the original version are <br />
<br />
* support of two synthon files as input, one per each reagent<br />
* output of the synthon IDs and reaction ID, that led to a specific compound.<br />
<syntaxhighlight lang="shell"><br />
source /nfs/soft2/anaconda3/bin/activate SynthI-env<br />
</syntaxhighlight>Prepare synthons for a specific reaction. List of supported reactions can be found in the .pdf file in the parent dir. First, make .smi file with SMILES and id of each building block (e.g. amine and carboxylic acid). Then prepare synthons from each of the BB files:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BBsBulkClassificationAndSynthonization.py -i bb-list.smi<br />
</syntaxhighlight>Leave only SMILES and names<syntaxhighlight lang="shell"><br />
awk '{print $1 " " $NF}' bb_list.smi_Synthmode.smi > synth.smi<br />
</syntaxhighlight>Then you can enumerate the library based on two reagent (synthon) lists:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i synthon_X.smi -i2 synthon_Y.smi -oD RESULTS/ --enumerationMode --nCores 2<br />
</syntaxhighlight>You will get results in RESULTS/FinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi:<br />
C=C(C)C(O)(C(=O)n1cc(C(C)=O)cc1C(=O)NCc1ccccc1)C(F)(F)F R5.3_CSSB00155635782_CSSB00000019219_<br />
Where R5.3 is the reaction ID (see .pdf), followed by two synthon IDs. The script is also capable of invoking only specified reactions, by using flags <code>--fragmentationMode include_only --reactionsToWorkWith "R3, R5"</code> (not tested).<br />
<br />
<br />
=TODO=<br />
<br />
# Work with 2 synthon files, instead of one. -- DONE<br />
# Output identifiers of enumerated compds: RXN_synt1_synt2 -- DONE<br />
# Processing the end of synthon file<br />
# Test sets for each reaction<br />
# Checking multistage reactions</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=14075Andrii's notes on SynthI2022-03-22T02:48:46Z<p>Iamkaant: Manual for enumeration using new code.</p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
= Preparing analogs with SynthI =<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
== A. Enumeration based on all found BBs ==<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.<br />
<br />
Different from analog generation, the output of enumeration '''does not''' contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
== B. Analogs from a synthon library ==<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.<br />
<br />
== Enumeration with the updated SynthI (work in progress) ==<br />
Current version of SynthI is at /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC. The main changes from the original version are <br />
<br />
* support of two synthon files as input, one per each reagent<br />
* output of the synthon IDs and reaction ID, that led to a specific compound.<br />
<syntaxhighlight lang="shell"><br />
source /nfs/soft2/anaconda3/bin/activate SynthI-env<br />
</syntaxhighlight>Prepare synthons for a specific reaction. List of supported reactions can be found in the .pdf file in the parent dir. First, make .smi file with SMILES and id of each building block (e.g. amine and carboxylic acid). Then prepare synthons from each of the BB files:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BBsBulkClassificationAndSynthonization.py -i bb-list.smi<br />
</syntaxhighlight>Leave only SMILES and names<syntaxhighlight lang="shell"><br />
awk '{print $1 " " $NF}' bb_list.smi_Synthmode.smi > synth.smi<br />
</syntaxhighlight>Then you can enumerate the library based on two reagent (synthon) lists:<syntaxhighlight lang="shell"><br />
python /mnt/nfs/exa/work/ak87/PROGRAM/SynthI-master-CC/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i synthon_X.smi -i2 synthon_Y.smi -oD RESULTS/ --enumerationMode --nCores 2<br />
</syntaxhighlight>You will get results in RESULTS/FinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi:<br />
C=C(C)C(O)(C(=O)n1cc(C(C)=O)cc1C(=O)NCc1ccccc1)C(F)(F)F R5.3_CSSB00155635782_CSSB00000019219_<br />
Where R5.3 is the reaction ID (see .pdf), followed by two synthon IDs. The script is also capable of invoking only specified reactions, by using flags <code>--fragmentationMode include_only --reactionsToWorkWith "R3, R5"</code> (not tested).<br />
<br />
<br />
=TODO=<br />
<br />
# Work with 2 synthon files, instead of one. -- DONE<br />
# Output identifiers of enumerated compds: RXN_synt1_synt2 -- DONE<br />
# Processing the end of synthon file</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=14024Andrii's notes on SynthI2022-03-16T23:53:32Z<p>Iamkaant: </p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
= Preparing analogs with SynthI =<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
== A. Enumeration based on all found BBs ==<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.<br />
<br />
Different from analog generation, the output of enumeration '''does not''' contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
== B. Analogs from a synthon library ==<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.<br />
<br />
=TODO=<br />
<br />
# Work with 2 synthon files, instead of one.<br />
# Output identifiers of enumerated compds: RXN_synt1_synt2<br />
# Processing the end of synthon file</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=13998Andrii's notes on SynthI2022-03-15T21:56:28Z<p>Iamkaant: </p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
== Preparing analogs with SynthI ==<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
=== A. Enumeration based on all found BBs ===<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.<br />
<br />
Different from analog generation, the output of enumeration '''does not''' contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
=== B. Analogs from a synthon library ===<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.</div>Iamkaanthttp://wiki.docking.org/index.php?title=Andrii%27s_notes_on_SynthI&diff=13997Andrii's notes on SynthI2022-03-15T21:53:18Z<p>Iamkaant: </p>
<hr />
<div>Parent page: [[SynthI]]<br />
<br />
== Preparing analogs with SynthI ==<br />
<br />
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.<br />
<br />
<br />
First, we need to fragment our compounds:<br />
<br />
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1<br />
<br />
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:<br />
<br />
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1<br />
<br />
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:<br />
<br />
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi<br />
<br />
Then cat into one file and prepare synthons from the BBs found.<br />
<br />
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi<br />
<br />
Leave only SMILES and names<br />
<br />
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi<br />
<br />
<br />
=== A. Enumeration based on all found BBs ===<br />
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds<br />
Different from analog generation, the output of enumeration does not contain the reactions and synthons used for compod. generation.<br />
<br />
Using all available reactions:<br />
<br />
mkdr ENUMERATED<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto.png|300px]]<br />
<br />
Using the same reactions that were used for initial fragmentation:<br />
<br />
mkdir ENUMERATED-R3-R5<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"<br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]<br />
<br />
<br />
=== B. Analogs from a synthon library ===<br />
<br />
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 <br />
Morgan2 Tc of obtained compds to the parent.<br />
[[Image:Tanimoto-synthi-analogs.png|300px]]<br />
<br />
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.</div>Iamkaant