Generating decoys (Reed's way): Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
Line 4: Line 4:
     /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/
     /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/


== Input SMILES file ==
== Setting up SMILES directory ==


Starting with a SMILES file with the format (SMILES first, ID second):
Before starting, you need a SMILES file with the formate (SMILES first, ID second):
   S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116
   S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116


Run the following command to protonate the SMILES, and create the decoy generation directory:
You also need an input file named "decoy_generation.in" with the following lines:
  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}
     PROTONATE YES
 
Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named
"ligand_${number}" for each of the ligands in the SMILES file you input.
 
== Only create SMILES directory ==
 
If you already have a SMILES file that is protonated correctly, you can just create a SMILES directory with the correct format.
To do this, run the following command:
  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/alt_0000_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}
 
== Retrieving decoys from ZINC15 ==
 
Now that you have a decoy generation directory, go into the directory and create a file named:
     {NEW_DIR_NAME}/decoy_generation_input.txt
 
The file should have the format:
     MWT 20 125
     MWT 20 125
     LOGP 0.4 3.6
     LOGP 0.4 3.6
Line 34: Line 18:
     CHARGE 0 2
     CHARGE 0 2
     DECOYS PER LIGAND 50
     DECOYS PER LIGAND 50
   
If your SMILES file is already protonated as you want it, set "PROTONATE NO".


This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
Line 44: Line 30:
     - 0.35 or less Tanimoto
     - 0.35 or less Tanimoto


These are the default value, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands. For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have. Once this file is created, go out of this directory and run:
These are the default value, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands. For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have. Once you have created this file, run the following command to create the decoy generation directory:
 
  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}
 
Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named
"ligand_${number}" for each of the ligands in the SMILES file you input.
 
== Retrieving decoys from ZINC15 ==
 
If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:


     python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}
     python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}

Revision as of 16:26, 6 April 2018

Written on April 3, 2018.

All scripts for this tutorial can be found in:

   /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/

Setting up SMILES directory

Before starting, you need a SMILES file with the formate (SMILES first, ID second):

  S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116

You also need an input file named "decoy_generation.in" with the following lines:

   PROTONATE YES
   MWT 20 125
   LOGP 0.4 3.6
   RB 1 5
   HBA 0 4
   HBD 0 3
   CHARGE 0 2
   DECOYS PER LIGAND 50
   

If your SMILES file is already protonated as you want it, set "PROTONATE NO".

This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:

    - within 125 Daltons
    - within 3.6 logP
    - within 5 rotatable bonds
    - within 4 hydrogen bond acceptors
    - within 3 hydrogen bond donors
    - within +/- 2 charge
    - 0.35 or less Tanimoto

These are the default value, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands. For "DECOYS PER LIGAND", input the number of decoys that each ligand protomer should have. Once you have created this file, run the following command to create the decoy generation directory:

  python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}

Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named "ligand_${number}" for each of the ligands in the SMILES file you input.

Retrieving decoys from ZINC15

If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}

Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.

Removing decoys that are too similar to known ligands

To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0002_remove_similar_compounds.py {NEW_DIR_NAME}

This will run on the queue.

Assigning accepted decoys to each ligand protomer

Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys to the ligand protomers. Make sure you have the "decoy_generation_input.txt" file from before in {NEW_DIR_NAME}.

To filter the decoys, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0003_qsub_filter_decoys.py {NEW_DIR_NAME}

This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.

Copying decoy .db2.gz files into your directories

Now that we have assigned decoys to your ligand protomers, we can copy these decoys into your own directory of choice. To do this, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004_copy_decoys_to_new_dir.py {NEW_DIR_NAME} {COPY_TO_DIR}

where {COPY_TO_DIR} is a new directory that will be created where your decoys will be copied into. In this directory, two subdirectories will be created:

    "ligands" - this will include "ligands.smi" which includes all the SMILES strings that have at least 50 property matched decoys
    "decoys" - this will include the decoy .db2.gz files for docking and "decoys.smi" which contains all the SMILES strings for property matched decoys

IMPORTANT: It is possible that there were not 50 property-matched decoys for all of your ligand protomers. The "ligands.smi" file in {COPY_TO_DIR} will not include these. Make sure you do not dock these if you calculate enrichment values.

Visualizing property distributions

To visualize the distributions of molecular properties of matched decoys relative to the ligands, run the following command:

   python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0005_plot_properties.py {NEW_DIR_NAME}

There will be 6 images in {NEW_DIR_NAME} for molecular weight, logP, number of rotatable bonds, number of hydrogen bond donors, number of hydrogen bond acceptors, and net charge of ligands and decoys.