Generating decoys (Reed's way)
Written on April 3, 2018.
All scripts for this tutorial can be found in:
/mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/
Input SMILES file
Starting with a SMILES file with the format (SMILES first, ID second):
S(Nc1c(O)cc(C(=O)O)cc1)(c2c(scc2)C(=O)O)(=O)=O 116
Run the following command to protonate the SMILES, and create the decoy generation directory:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0000_protonate_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}
Provide a directory name that you want in place of {NEW_DIR_NAME}. This will create the directory with subdirectories named "ligand_${number}" for each of the ligands in the SMILES file you input.
Only create SMILES directory
If you already have a SMILES file that is protonated correctly, you can just create a SMILES directory with the correct format. To do this, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/alt_0000_setup_dirs.py {SMILES_FILE} {NEW_DIR_NAME}
Retrieving Decoys from ZINC15
Now that you have a decoy generation directory, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0001_qsub_generate_decoys.py {NEW_DIR_NAME}
For each ligand protomer, 50 decoys will be retrieved with the following properties:
- within 125 Daltons - within 3.6 logP - within 5 rotatable bonds - within 4 hydrogen bond acceptors - within 3 hydrogen bond donors - within +/- 2 charge - 0.35 or less Tanimoto
These are the original parameters used for DUD-E. The ranges can be altered if desired. If you would like to run CHARGE MATCHED decoy retrieval (i.e., decoys have the same charge as ligand protomers), run the following command instead:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0001_qsub_CHARGE_MATCHED_generate_decoys.py {NEW_DIR_NAME}
Jobs will run 5 at a time until completed. This should take a few hours, depending on how many ligands you input.
Removing Decoys that are too similar to known ligands
To remove any decoys retrieved that are too similar to all the ligands you have retrieved decoys for, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0002_remove_similar_compounds.py {NEW_DIR_NAME}
This will run on the queue.
Assigning accepted decoys to each ligand protomer
Now that the previous script has removed any decoys that were too similar to known ligands, we can assign the remaining decoys to the ligand protomers. To do this, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0003_qsub_filter_decoys.py {NEW_DIR_NAME}
If you are running CHARGE MATCHED decoy retrieval, use the following command instead of the one above:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0003_CHARGE_MATCHED_filter_decoys.py {NEW_DIR_NAME}
This will run on the queue.
Copying decoy .db2.gz files into your directories
Now that we have assigned decoys to your ligand protomers, we can copy these decoys into your own directory of choice. To do this, run the following command:
python /mnt/nfs/home/rstein/zzz.scripts/DUDE_SCRIPTS/0004_copy_decoys_to_new_dir.py {NEW_DIR_NAME} {COPY_TO_DIR}
where {COPY_TO_DIR} is a new directory that will be created where your decoys will be copied into. In this directory, two subdirectories will be created:
"ligands" - this will include "ligands.smi" which includes all the SMILES strings that have at least 50 property matched decoys "decoys" - this will include the decoy .db2.gz files for docking and "decoys.smi" which contains all the SMILES strings for property matched decoys
IMPORTANT: It is possible that there were not 50 property-matched decoys for all of your ligand protomers. The "ligands.smi" file in {COPY_TO_DIR} will not include these. Make sure you do not dock these if you calculating enrichment values.