Filtering ligands for novelty

From DISI
Revision as of 22:14, 1 October 2018 by Chasemwebb (talk | contribs)
Jump to navigation Jump to search

Written by Chase Webb 09-01-2018

After a large scale docking campaign, it is important to remove prospective ligands that are too similar to compounds that are already known to modulate the receptor. In this way, we can focus on assessing new chemical interactions. This is best completed after clustering has been conducted as specified here:Processing Results from LSD

This process proceeds in the following steps:

Make a new directory to do similarity filtering.

Make a symbolic link to the location where clustering occurred.

1. Generate a list of smiles for the known compounds. The most simple way to do this is to download them from ZINC. For the Mu opioid receptor (OPRM1) for instance, go here: ZINC15 Genes

Search for Your Molecules in ZINC Using the UNIPROT Ascension ID for Your Target, for example OPRM1 for the Mu Receptor


2. Generate Fingerprints for the known compounds. Run the following script written by TEB and JKL. The inputs are name of the knowns file and the name of the output fingerprint file.

python ~jklyu/zzz.github/ChemInfTools/utils/teb_chemaxon_cheminf_tools/generate_chemaxon_fingerprints.py knowns_list.smi knowns

3. Convert the fingerprints from binary to unsigned integers. Run the following script written by TEB and JKL. The inputs are the bitstrings generated from the above script, the smiles file used to generate the above script, and the prefix of the output file. You will need to do this for the knowns and the clusterheads that were calculated in the previous tutorial: Processing Results from LSD

python ~jklyu/zzz.github/ChemInfTools/utils/convert_fp_2_fp_in_16unit/convert_fp_2_fp_in_uint16 knowns.fp knowns.fp knowns_list.smi knowns
python ~jklyu/zzz.github/ChemInfTools/utils/convert_fp_2_fp_in_16unit/convert_fp_2_fp_in_uint16 extract_all.topN.sort.uniq.fp extract_all.topN.zincid.sort.uniq.smi topN_clusterhead

4. Calculate an all by all TC matrix for the knowns against the clusterheads.