Generating decoys (Reed's way)
Written by Reed Stein on April 3, 2018. updated 5/3/2019
This pipeline will generate property-matched decoys for a ligand SMILES file, and will copy decoy .db2.gz files into a directory for you. To build ligands yourself, see "ligand prep" in:
http://wiki.docking.org/index.php/DOCK_3.7_tutorial_%28Anat%29
All scripts for this tutorial can be found in:
/mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/
Before running any scripts, make sure to source the current version of Python
source /nfs/soft/python/envs/complete/current/env.csh
Additionally, JChem needs to be sourced in your ~/.cshrc file with the command:
source /nfs/soft/jchem/current/env.csh
Querying ZINC for Protomers
This procedure is advised if you want decoys to be charge-matched to ligands.
Step 1) Setting up directories for Protomers
Before starting, you need a SMILES file with the format (SMILES first, ID second):
S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116
You also need an input file named "decoy_generation.in" with the following lines:
PROTONATE YES MWT 20 125 LOGP 0.4 3.6 RB 1 5 HBA 0 4 HBD 0 3 CHARGE 0 2 MINIMUM DECOYS PER LIGAND 20 DECOYS PER LIGAND 50 MAXIMUM TC BETWEEN DECOYS 0.8 TANIMOTO YES
Notice that this is not the same "decoy_generation.in" as for "Generating Decoys with SMILES"
If your SMILES file is already protonated as you want it, set "PROTONATE NO".
This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
- within 125 Daltons - within 3.6 logP - within 5 rotatable bonds - within 4 hydrogen bond acceptors - within 3 hydrogen bond donors - within +/- 2 charge - 0.35 or less Tanimoto - minimum 20 decoys per ligand protomer, if available - preferred 50 decoys per ligand protomer, if available - the maximum TC between decoy molecules should be 0.8 - "TANIMOTO" refers to whether a Tanimoto calculation should be performed - see step 3 for when this is necessary
These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.
Once you have created this file, run the following command to create the decoy generation directory:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}
Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named "ligand_${number}" for each of the ligands in the SMILES file you input.
Step 2) Retrieving protomer decoys from ZINC15
If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}
This should take 15 minutes to an hour, depending on how many ligands you input.
Step 3) Assigning accepted protomer decoys to each ligand protomer
We can assign the property-matched decoys to the ligand protomers. Make sure you have the "decoy_generation_input.in" file from before in {NEW_DIR_NAME}.
To filter the decoys, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0002_qsub_filter_decoys.py {NEW_DIR_NAME}
This will run on the queue. A log file called "FILTER_DECOYS.log" will be generated in {NEW_DIR_NAME} with information and any errors.
If you don't get enough decoys, the "decoy_generation.in" file can be modified by changing "MAXIMUM TC BETWEEN DECOYS", "MINIMUM DECOYS PER LIGAND", etc. To not run the time-consuming Tanimoto calculation between all ligands and decoys again, simply add/change this in the "decoy_generation.in" file:
TANIMOTO NO
If you set Tanimoto to "NO", make sure that your {NEW_DIR_NAME} still has the original files:
"test_ligdecoy_smiles.smi" "Tc_matrix" "Max_Tc_col"
Otherwise, this step will not run.
If these original files still remain, this will skip the Tanimoto calculation step, and filter property matched decoys based on the new parameters in the "decoy_generation.in" file.
If you still cannot get enough decoys for your ligands, consider reducing the number of ligands you have by clustering, for example, or using the SMILES decoy generation below, which is not limited to only molecules that are already built in ZINC15.
Step 4) Copying decoy .db2.gz files into your directories
To copy property-matched decoys into your own directory of choice, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0003_copy_decoys_to_new_dir.py {NEW_DIR_NAME} {COPY_TO_DIR}
where {COPY_TO_DIR} is a new directory that will be created where your decoys will be copied into. In this directory, two subdirectories will be created:
"ligands" - this will include "ligands.smi" which includes all the SMILES strings that have at least 50 property matched decoys "decoys" - this will include the decoy .db2.gz files for docking and "decoys.smi" which contains all the SMILES strings for property matched decoys
IMPORTANT: It is possible that there were not 50 property-matched decoys for all of your ligand protomers. The "ligands.smi" file in {COPY_TO_DIR} will not include these. Make sure you do not dock these if you calculate enrichment values.
Querying ZINC for SMILES
If you would like to query ZINC for decoy SMILES so that you can build decoys yourself or if your ligands are >400 Da, continue here. If not, go to "Querying ZINC for Protomers"
Step 1) Setting up SMILES directory
Before starting, you need a SMILES file with the format (SMILES first, ID second):
S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116
You also need an input file named "decoy_generation.in" with the following lines:
SMILES YES PROTONATE YES MWT 20 125 LOGP 0.4 3.6 RB 1 5 HBA 0 4 HBD 0 3 CHARGE 0 2 MINIMUM DECOYS PER LIGAND 20 DECOYS PER LIGAND 50 MAXIMUM TC BETWEEN DECOYS 0.8 TANIMOTO YES GENERATE DECOYS 750
If your SMILES file is already protonated as you want it, set "PROTONATE NO". "SMILES" tells the function you want to query ZINC for SMILES, not built protomers.
This file specifies that for each ligand protomer, 50 decoys will be retrieved with the following properties:
- within 125 Daltons - within 3.6 logP - within 5 rotatable bonds - within 4 hydrogen bond acceptors - within 3 hydrogen bond donors - within +/- 2 charge - 0.35 or less Tanimoto
"GENERATE DECOYS" specifies how many potential decoys you want to check for property matching with your ligands. A smaller number results in faster decoy generation, but a smaller pool of potential decoys to compare your ligand against. A larger number results in slower decoy generation, and greater likelihood of property-matched decoys for all your ligands.
As with protomers, "MINIMUM DECOYS PER LIGAND" refers to the minimum number of decoys you want for each ligand protomer;
"DECOYS PER LIGAND" refers to your preferred number of decoys for each ligand protomer;
"MAXIMUM TC BETWEEN DECOYS" refers to the maximum Tc allowed between decoys (the lower, the more dissimilar your decoys will be);
and "TANIMOTO" refers to whether the ligand-decoy full Tc matrix should be calculated - this must be done at least once and should not be set to "NO" unless you are re-running step 3.
These are the default values, but you can input your desired minimum and maximum values that decoys can differ by, relative to the ligands.
Once you have created this file, run the following command to create the decoy generation directory:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}
Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named "ligand_${number}" for each of the ligands in the SMILES file you input.
Step 2) Retrieving SMILES decoys from ZINC15
If you have edited the "decoy_generation.in" file which is now located in {NEW_DIR_NAME} as you want, you can run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}
Jobs will run for 15 minutes to 1-2 hours depending on how many ligands you input.
Step 3) Assigning decoys to ligands
To assign property matched decoys to your ligand protomers, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0002_qsub_filter_decoys.py {NEW_DIR_NAME}
This will run on the queue. As with "Querying ZINC for Protomers":
If you don't get enough decoys, the "decoy_generation.in" file can be modified by changing "MAXIMUM TC BETWEEN DECOYS", "MINIMUM DECOYS PER LIGAND", etc. To not run the time-consuming Tanimoto calculation between all ligands and decoys again, simply add/change this in the "decoy_generation.in" file:
TANIMOTO NO
If you set Tanimoto to "NO", make sure that your {NEW_DIR_NAME} still has the original files:
"test_ligdecoy_smiles.smi" "Tc_matrix" "Max_Tc_col"
Otherwise, this step will not run.
If these original files still remain, this will skip the Tanimoto calculation step, and filter property matched decoys based on the new parameters in the "decoy_generation.in" file.
Step 4) Setting up ligand/decoy directories for building SMILES
If you have queried ZINC for SMILES, you need to build the decoys yourself. To write the SMILES file, run the following command:
python /mnt/nfs/home/rstein/new_DUDE_SCRIPTS/0003b_write_out_ligands_decoys.py {NEW_DIR_NAME} {COPY_TO_DIR}
SMILES for decoys can now be built.
- Note that building decoys can generate multiple protomers for the same ligand. This may result in docking decoys that were not property matched to your ligands. A script to identify the correct decoy protomer is under construction.
Visualizing Decoy Properties
Visualizing property distributions
To visualize the distributions of molecular properties of matched decoys relative to the ligands, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0004_plot_properties.py {NEW_DIR_NAME}
There will be 6 images in {NEW_DIR_NAME} for molecular weight, logP, number of rotatable bonds, number of hydrogen bond donors, number of hydrogen bond acceptors, and net charge of ligands and decoys.
Visualizing decoy Tanimotos to ligands
To visualize how different the matched decoys are to the input ligands, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/new_DUDE_SCRIPTS/0005_plot_tanimoto_to_lig.py {NEW_DIR_NAME}
There will be a box and whisker plot image in {NEW_DIR_NAME} showing the Tanimotos calculated between each ligand and all decoys.