Generating decoys (Reed's way): Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
Line 13: Line 13:
   source /nfs/soft/jchem/current/env.csh
   source /nfs/soft/jchem/current/env.csh


== Querying ZINC for SMILES ==


If you would like to query ZINC for decoy SMILES so that you can build decoys yourself or if your ligands are >400 Da, continue here. If not, go to "Querying ZINC for Protomers"
== Querying ZINC for Protomers ==
 
This procedure is advised if you want decoys to be charge-matched to ligands.


=== Step 1) Setting up SMILES directory ===
=== Step 1) Setting up directories for Protomers ===


Before starting, you need a SMILES file with the format (SMILES first, ID second):
Before starting, you need a SMILES file with the format (SMILES first, ID second):
Line 23: Line 24:


You also need an input file named "decoy_generation.in" with the following lines:
You also need an input file named "decoy_generation.in" with the following lines:
    SMILES YES
     PROTONATE YES
     PROTONATE YES
     MWT 125
     MWT 20 125
     LOGP 3.6
     LOGP 0.4 3.6
     RB 5
     RB 1 5
     HBA 4
     HBA 0 4
     HBD 3
     HBD 0 3
     CHARGE 2
     CHARGE 0 2
     DECOYS PER LIGAND 50
     DECOYS PER LIGAND 50
    GENERATE DECOYS 750
      
      
If your SMILES file is already protonated as you want it, set "PROTONATE NO". "SMILES" tells the function you want to query ZINC for SMILES, not built protomers.
Notice that this is not the same "decoy_generation.in" as for "Generating Decoys with SMILES"
 
If your SMILES file is already protonated as you want it, set "PROTONATE NO".  


This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
Line 44: Line 45:
     - within +/- 2 charge
     - within +/- 2 charge
     - 0.35 or less Tanimoto
     - 0.35 or less Tanimoto
"GENERATE DECOYS" specifies how many potential decoys you want to check for property matching with your ligands. A smaller number results in faster decoy generation, but a smaller pool of potential decoys to compare your ligand against. A larger number results in slower decoy generation, and greater likelihood of property-matched decoys for all your ligands.


These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.  
These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.  


For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have.  
For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have.


Once you have created this file, run the following command to create the decoy generation directory:
Once you have created this file, run the following command to create the decoy generation directory:
Line 58: Line 57:
"ligand_${number}" for each of the ligands in the SMILES file you input.
"ligand_${number}" for each of the ligands in the SMILES file you input.


=== Step 2) Retrieving SMILES decoys from ZINC15 ===
=== Step 2) Retrieving protomer decoys from ZINC15 ===


If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:
If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:
Line 66: Line 65:
Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.
Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.


=== Step 3) Removing SMILES decoys that are too similar to known ligands ===
=== Step 3) Removing protomer decoys that are too similar to known ligands ===


To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for,
To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for,
Line 74: Line 73:
This will run on the queue.
This will run on the queue.


=== Step 4) Assigning accepted SMILES decoys to each ligand protomer ===
=== Step 4) Assigning accepted protomer decoys to each ligand protomer ===


Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys
Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys
Line 84: Line 83:
This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.
This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.


=== Step 5) Setting up ligand/decoy directories for building SMILES ===
=== Step 5) Copying decoy .db2.gz files into your directories ===


If you have queried ZINC for SMILES, you need to build the decoys yourself. To write the SMILES file, run the following command:
To copy property-matched decoys into your own directory of choice, run the following command:


  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004b_write_out_ligands_decoys.py {NEW_DIR_NAME} {COPY_TO_DIR}
    python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004_copy_decoys_to_new_dir.py {NEW_DIR_NAME} {COPY_TO_DIR}


SMILES for decoys can now be built.
where {COPY_TO_DIR} is a new directory that will be created where your decoys will be copied into. In this directory, two subdirectories will be created:
    "ligands" - this will include "ligands.smi" which includes all the SMILES strings that have at least 50 property matched decoys
    "decoys" - this will include the decoy .db2.gz files for docking and "decoys.smi" which contains all the SMILES strings for property matched decoys


IMPORTANT: It is possible that there were not 50 property-matched decoys for all of your ligand protomers. The "ligands.smi" file in {COPY_TO_DIR} will not include these. Make
sure you do not dock these if you calculate enrichment values.


== Querying ZINC for Protomers ==
== Querying ZINC for SMILES ==


This procedure is advised if you want decoys to be charge-matched to ligands.
If you would like to query ZINC for decoy SMILES so that you can build decoys yourself or if your ligands are >400 Da, continue here. If not, go to "Querying ZINC for Protomers"


=== Step 1) Setting up directories for Protomers ===
=== Step 1) Setting up SMILES directory ===


Before starting, you need a SMILES file with the format (SMILES first, ID second):
Before starting, you need a SMILES file with the format (SMILES first, ID second):
Line 103: Line 106:


You also need an input file named "decoy_generation.in" with the following lines:
You also need an input file named "decoy_generation.in" with the following lines:
    SMILES YES
     PROTONATE YES
     PROTONATE YES
     MWT 20 125
     MWT 125
     LOGP 0.4 3.6
     LOGP 3.6
     RB 1 5
     RB 5
     HBA 0 4
     HBA 4
     HBD 0 3
     HBD 3
     CHARGE 0 2
     CHARGE 2
     DECOYS PER LIGAND 50
     DECOYS PER LIGAND 50
    GENERATE DECOYS 750
      
      
Notice that this is not the same "decoy_generation.in" as for "Generating Decoys with SMILES"
If your SMILES file is already protonated as you want it, set "PROTONATE NO". "SMILES" tells the function you want to query ZINC for SMILES, not built protomers.
 
If your SMILES file is already protonated as you want it, set "PROTONATE NO".  


This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
Line 124: Line 127:
     - within +/- 2 charge
     - within +/- 2 charge
     - 0.35 or less Tanimoto
     - 0.35 or less Tanimoto
"GENERATE DECOYS" specifies how many potential decoys you want to check for property matching with your ligands. A smaller number results in faster decoy generation, but a smaller pool of potential decoys to compare your ligand against. A larger number results in slower decoy generation, and greater likelihood of property-matched decoys for all your ligands.


These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.  
These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.  


For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have.
For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have.  


Once you have created this file, run the following command to create the decoy generation directory:
Once you have created this file, run the following command to create the decoy generation directory:
Line 136: Line 141:
"ligand_${number}" for each of the ligands in the SMILES file you input.
"ligand_${number}" for each of the ligands in the SMILES file you input.


=== Step 2) Retrieving protomer decoys from ZINC15 ===
=== Step 2) Retrieving SMILES decoys from ZINC15 ===


If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:
If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:
Line 144: Line 149:
Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.
Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.


=== Step 3) Removing protomer decoys that are too similar to known ligands ===
=== Step 3) Removing SMILES decoys that are too similar to known ligands ===


To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for,
To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for,
Line 152: Line 157:
This will run on the queue.
This will run on the queue.


=== Step 4) Assigning accepted protomer decoys to each ligand protomer ===
=== Step 4) Assigning accepted SMILES decoys to each ligand protomer ===


Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys
Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys
Line 162: Line 167:
This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.
This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.


=== Step 5) Copying decoy .db2.gz files into your directories ===
=== Step 5) Setting up ligand/decoy directories for building SMILES ===


To copy property-matched decoys into your own directory of choice, run the following command:
If you have queried ZINC for SMILES, you need to build the decoys yourself. To write the SMILES file, run the following command:


    python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004_copy_decoys_to_new_dir.py {NEW_DIR_NAME} {COPY_TO_DIR}
  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004b_write_out_ligands_decoys.py {NEW_DIR_NAME} {COPY_TO_DIR}


where {COPY_TO_DIR} is a new directory that will be created where your decoys will be copied into. In this directory, two subdirectories will be created:
SMILES for decoys can now be built.
    "ligands" - this will include "ligands.smi" which includes all the SMILES strings that have at least 50 property matched decoys
*** Note that building decoys can generate multiple protomers for the same ligand. This may result in docking decoys that were not property matched to your ligands. A script to identify the correct decoy protomer is under construction.
    "decoys" - this will include the decoy .db2.gz files for docking and "decoys.smi" which contains all the SMILES strings for property matched decoys


IMPORTANT: It is possible that there were not 50 property-matched decoys for all of your ligand protomers. The "ligands.smi" file in {COPY_TO_DIR} will not include these. Make
sure you do not dock these if you calculate enrichment values.


== Visualizing Decoy Properties ==
== Visualizing Decoy Properties ==

Revision as of 01:12, 7 February 2019

Written by Reed Stein on April 3, 2018.

This pipeline will generate property-matched decoys for a ligand SMILES file, and will copy decoy .db2.gz files into a directory for you. To build ligands yourself, see "ligand prep" in:

   http://wiki.docking.org/index.php/DOCK_3.7_tutorial_%28Anat%29

All scripts for this tutorial can be found in:

   /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/

Before running any scripts, make sure to source the current version of Python

  source /nfs/soft/python/envs/complete/current/env.csh

Additionally, JChem needs to be sourced in your ~/.cshrc file with the command:

  source /nfs/soft/jchem/current/env.csh


Querying ZINC for Protomers

This procedure is advised if you want decoys to be charge-matched to ligands.

Step 1) Setting up directories for Protomers

Before starting, you need a SMILES file with the format (SMILES first, ID second):

  S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116

You also need an input file named "decoy_generation.in" with the following lines:

   PROTONATE YES
   MWT 20 125
   LOGP 0.4 3.6
   RB 1 5
   HBA 0 4
   HBD 0 3
   CHARGE 0 2
   DECOYS PER LIGAND 50
   

Notice that this is not the same "decoy_generation.in" as for "Generating Decoys with SMILES"

If your SMILES file is already protonated as you want it, set "PROTONATE NO".

This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:

    - within 125 Daltons
    - within 3.6 logP
    - within 5 rotatable bonds
    - within 4 hydrogen bond acceptors
    - within 3 hydrogen bond donors
    - within +/- 2 charge
    - 0.35 or less Tanimoto

These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.

For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have.

Once you have created this file, run the following command to create the decoy generation directory:

  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}

Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named "ligand_${number}" for each of the ligands in the SMILES file you input.

Step 2) Retrieving protomer decoys from ZINC15

If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}

Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.

Step 3) Removing protomer decoys that are too similar to known ligands

To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0002_remove_similar_compounds.py {NEW_DIR_NAME}

This will run on the queue.

Step 4) Assigning accepted protomer decoys to each ligand protomer

Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys to the ligand protomers. Make sure you have the "decoy_generation_input.in" file from before in {NEW_DIR_NAME}.

To filter the decoys, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0003_qsub_filter_decoys.py {NEW_DIR_NAME}

This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.

Step 5) Copying decoy .db2.gz files into your directories

To copy property-matched decoys into your own directory of choice, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004_copy_decoys_to_new_dir.py {NEW_DIR_NAME} {COPY_TO_DIR}

where {COPY_TO_DIR} is a new directory that will be created where your decoys will be copied into. In this directory, two subdirectories will be created:

    "ligands" - this will include "ligands.smi" which includes all the SMILES strings that have at least 50 property matched decoys
    "decoys" - this will include the decoy .db2.gz files for docking and "decoys.smi" which contains all the SMILES strings for property matched decoys

IMPORTANT: It is possible that there were not 50 property-matched decoys for all of your ligand protomers. The "ligands.smi" file in {COPY_TO_DIR} will not include these. Make sure you do not dock these if you calculate enrichment values.

Querying ZINC for SMILES

If you would like to query ZINC for decoy SMILES so that you can build decoys yourself or if your ligands are >400 Da, continue here. If not, go to "Querying ZINC for Protomers"

Step 1) Setting up SMILES directory

Before starting, you need a SMILES file with the format (SMILES first, ID second):

  S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116

You also need an input file named "decoy_generation.in" with the following lines:

   SMILES YES
   PROTONATE YES
   MWT 125
   LOGP 3.6
   RB 5
   HBA 4
   HBD 3
   CHARGE 2
   DECOYS PER LIGAND 50
   GENERATE DECOYS 750
   

If your SMILES file is already protonated as you want it, set "PROTONATE NO". "SMILES" tells the function you want to query ZINC for SMILES, not built protomers.

This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:

    - within 125 Daltons
    - within 3.6 logP
    - within 5 rotatable bonds
    - within 4 hydrogen bond acceptors
    - within 3 hydrogen bond donors
    - within +/- 2 charge
    - 0.35 or less Tanimoto

"GENERATE DECOYS" specifies how many potential decoys you want to check for property matching with your ligands. A smaller number results in faster decoy generation, but a smaller pool of potential decoys to compare your ligand against. A larger number results in slower decoy generation, and greater likelihood of property-matched decoys for all your ligands.

These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.

For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have.

Once you have created this file, run the following command to create the decoy generation directory:

  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}

Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named "ligand_${number}" for each of the ligands in the SMILES file you input.

Step 2) Retrieving SMILES decoys from ZINC15

If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}

Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.

Step 3) Removing SMILES decoys that are too similar to known ligands

To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0002_remove_similar_compounds.py {NEW_DIR_NAME}

This will run on the queue.

Step 4) Assigning accepted SMILES decoys to each ligand protomer

Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys to the ligand protomers. Make sure you have the "decoy_generation_input.in" file from before in {NEW_DIR_NAME}.

To filter the decoys, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0003_qsub_filter_decoys.py {NEW_DIR_NAME}

This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.

Step 5) Setting up ligand/decoy directories for building SMILES

If you have queried ZINC for SMILES, you need to build the decoys yourself. To write the SMILES file, run the following command:

  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004b_write_out_ligands_decoys.py {NEW_DIR_NAME} {COPY_TO_DIR}

SMILES for decoys can now be built.

      • Note that building decoys can generate multiple protomers for the same ligand. This may result in docking decoys that were not property matched to your ligands. A script to identify the correct decoy protomer is under construction.


Visualizing Decoy Properties

Visualizing property distributions

To visualize the distributions of molecular properties of matched decoys relative to the ligands, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0005_plot_properties.py {NEW_DIR_NAME}

There will be 6 images in {NEW_DIR_NAME} for molecular weight, logP, number of rotatable bonds, number of hydrogen bond donors, number of hydrogen bond acceptors, and net charge of ligands and decoys.

Visualizing decoy Tanimotos to ligands

To visualize how different the matched decoys are to the input ligands, run the following command:

  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0006_plot_tanimoto_to_lig.py {NEW_DIR_NAME}

There will be a box and whisker plot image in {NEW_DIR_NAME} showing the Tanimotos calculated between each ligand and all decoys.