Filtering ligands for novelty: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
Written by Chase Webb 09-01-2018
Written by Chase Webb 09-01-2018


After a large scale docking campaign, it is important to remove prospective ligands that are too similar to compounds that are already known to modulate the receptor. In this way, we can focus on assessing new chemical interactions. This is best completed after clustering has been conducted as specified here [
After a large scale docking campaign, it is important to remove prospective ligands that are too similar to compounds that are already known to modulate the receptor. In this way, we can focus on assessing new chemical interactions. This is best completed after clustering has been conducted as specified here:[http://wiki.bkslab.org/index.php/How_to_process_results_from_a_large-scale_docking Processing Results from LSD]


=This process proceeds in the following steps:=
=This process proceeds in the following steps:=
Make a new directory to do similarity filtering.
Make a symbolic link to the location where clustering occurred.


1. '''Generate a list of smiles for the known compounds.''' The most simple way to do this is to download them from ZINC. For the Mu opioid receptor (OPRM1) for instance, go here:  [https://zinc15.docking.org/genes/home/ ZINC15 Genes]
1. '''Generate a list of smiles for the known compounds.''' The most simple way to do this is to download them from ZINC. For the Mu opioid receptor (OPRM1) for instance, go here:  [https://zinc15.docking.org/genes/home/ ZINC15 Genes]
Line 10: Line 14:




2. '''Generate Fingerprints for the known compounds''' Run the following script written by TEB and JKL. The inputs are name of the knowns file and the name of the output fingerprint file.
2. '''Generate Fingerprints for the known compounds.''' Run the following script written by TEB and JKL. The inputs are name of the knowns file and the name of the output fingerprint file.
  python ~jklyu/zzz.github/ChemInfTools/utils/teb_chemaxon_cheminf_tools/generate_chemaxon_fingerprints.py knowns_list.smi knowns
  python ~jklyu/zzz.github/ChemInfTools/utils/teb_chemaxon_cheminf_tools/generate_chemaxon_fingerprints.py knowns_list.smi knowns


3.
3. '''Convert the fingerprints from binary to unsigned integers.''' Run the following script written by TEB and JKL. The inputs are the bitstrings generated from the above script, the smiles file used to generate the above script, and the prefix of the output file. You will need to do this for the knowns and the clusterheads that were calculated in the previous tutorial: [http://wiki.bkslab.org/index.php/How_to_process_results_from_a_large-scale_docking Processing Results from LSD]
python ~jklyu/zzz.github/ChemInfTools/utils/convert_fp_2_fp_in_16unit/convert_fp_2_fp_in_uint16 knowns.fp knowns.fp knowns_list.smi knowns
python ~jklyu/zzz.github/ChemInfTools/utils/convert_fp_2_fp_in_16unit/convert_fp_2_fp_in_uint16 extract_all.topN.sort.uniq.fp extract_all.topN.zincid.sort.uniq.smi topN_clusterhead
 
4. '''Calculate an all by all TC matrix for the knowns against the clusterheads.'''

Revision as of 22:14, 1 October 2018

Written by Chase Webb 09-01-2018

After a large scale docking campaign, it is important to remove prospective ligands that are too similar to compounds that are already known to modulate the receptor. In this way, we can focus on assessing new chemical interactions. This is best completed after clustering has been conducted as specified here:Processing Results from LSD

This process proceeds in the following steps:

Make a new directory to do similarity filtering.

Make a symbolic link to the location where clustering occurred.

1. Generate a list of smiles for the known compounds. The most simple way to do this is to download them from ZINC. For the Mu opioid receptor (OPRM1) for instance, go here: ZINC15 Genes

Search for Your Molecules in ZINC Using the UNIPROT Ascension ID for Your Target, for example OPRM1 for the Mu Receptor


2. Generate Fingerprints for the known compounds. Run the following script written by TEB and JKL. The inputs are name of the knowns file and the name of the output fingerprint file.

python ~jklyu/zzz.github/ChemInfTools/utils/teb_chemaxon_cheminf_tools/generate_chemaxon_fingerprints.py knowns_list.smi knowns

3. Convert the fingerprints from binary to unsigned integers. Run the following script written by TEB and JKL. The inputs are the bitstrings generated from the above script, the smiles file used to generate the above script, and the prefix of the output file. You will need to do this for the knowns and the clusterheads that were calculated in the previous tutorial: Processing Results from LSD

python ~jklyu/zzz.github/ChemInfTools/utils/convert_fp_2_fp_in_16unit/convert_fp_2_fp_in_uint16 knowns.fp knowns.fp knowns_list.smi knowns
python ~jklyu/zzz.github/ChemInfTools/utils/convert_fp_2_fp_in_16unit/convert_fp_2_fp_in_uint16 extract_all.topN.sort.uniq.fp extract_all.topN.zincid.sort.uniq.smi topN_clusterhead

4. Calculate an all by all TC matrix for the knowns against the clusterheads.