Substructure searching: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
 
(6 intermediate revisions by the same user not shown)
Line 11: Line 11:
               |                                                 
               |                                                 
               |
               |
               ------- script ------ submit.csh
               ------- scripts ------ submit.csh
                              |
                              |------ submit_sub_search.csh
                              |
                              |------ run_sub_search.csh
                              |
                              |------ search_multi_substructures.py
                               |
                               |
                               |------ setup_substructure_searching_files.py
                               |------ setup_substructure_searching_files.py
Line 25: Line 19:
  cd substructure_searching
  cd substructure_searching
  mkdir working
  mkdir working
  mkdir script
  mkdir scripts


2) Download databases index from ZINC
2) Download databases index from ZINC
Line 32: Line 26:


2.2) Choose the tranches you want to do substructure searching
2.2) Choose the tranches you want to do substructure searching
fig1
[[File:subsearching_fig1.png|thumb|center|500px|Choose the tranches you want to do substructure searching]]


2.3) download the databases index file
2.3) download the databases index file
fig2
[[File:subsearching_fig2.png|thumb|center|500px|download the databases index file]]
 
2.4) download the file above and save it as ZINC-downloader-2D-smi.database_index, then upload the file to the working directory


3) Copy scripts from my path.
3) Copy scripts from my path.
  cd script
  cd scripts
  cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/setup_substructure_searching_files.py .
  cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/setup_substructure_searching_files.py .
  cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/multi_sub_searching/submit.csh .
  cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/multi_sub_searching/submit.csh .
  cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/multi_sub_searching/run_sub_search.csh .
  cd ../
  cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/multi_sub_searching/submit_sub_search.csh .
 
  cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/multi_sub_searching/search_multi_substructures.py .
4) Put SMARTS patterns you want to search in the sub_pattern.smarts file and give each SMARTS pattern a unique number or name
Here is an example in the sub_pattern.smarts file
NS(=O)(=O)c1cccc([F,Cl,Br,I])c1[OD1] 1
NS(=O)(=O)c1cc([F,Cl,Br,I])ccc1[OD1] 2
 
5) Split the ZINC-downloader-2D-smi.database_index file into chunks
cd working
python ../scripts/setup_substructure_searching_files.py . sub_searching_  ZINC-downloader-2D-smi.database_index number_of_chunks(change it to real number) count
I suggest 3 SMILES files per chunk, so change the number_of_chunks based on your real size of tranches.
 
6) Submit substructure searching jobs
  csh ../scripts/submit.csh full_path_of_sub_pattern.smarts
 
7) Collect results
  cat sub_searching_*/*.extract.output.smi > output.smi

Latest revision as of 19:14, 13 September 2017

Written by Jiankun Lyu, 2017/09/13

The hierarchy of the directories:

substructure_searching----- working 
              |                |
              |                |------ ZINC-downloader-2D-smi.database_index
              |                | 
              |                |------ sub_pattern.smarts
              |                                                 
              |                                                 
              |
              ------- scripts ------ submit.csh
                              |
                              |------ setup_substructure_searching_files.py

1) Make those directories above.

mkdir substructure_searching
cd substructure_searching
mkdir working
mkdir scripts

2) Download databases index from ZINC

2.1) Go to ZINC http://zinc15.docking.org/tranches/home/#

2.2) Choose the tranches you want to do substructure searching

Choose the tranches you want to do substructure searching

2.3) download the databases index file

download the databases index file

2.4) download the file above and save it as ZINC-downloader-2D-smi.database_index, then upload the file to the working directory

3) Copy scripts from my path.

cd scripts
cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/setup_substructure_searching_files.py .
cp /mnt/nfs/home/jklyu/zzz.script/analogs_searching/multi_sub_searching/submit.csh .
cd ../

4) Put SMARTS patterns you want to search in the sub_pattern.smarts file and give each SMARTS pattern a unique number or name

Here is an example in the sub_pattern.smarts file
NS(=O)(=O)c1cccc([F,Cl,Br,I])c1[OD1] 1
NS(=O)(=O)c1cc([F,Cl,Br,I])ccc1[OD1] 2

5) Split the ZINC-downloader-2D-smi.database_index file into chunks

cd working
python ../scripts/setup_substructure_searching_files.py . sub_searching_  ZINC-downloader-2D-smi.database_index number_of_chunks(change it to real number) count

I suggest 3 SMILES files per chunk, so change the number_of_chunks based on your real size of tranches.

6) Submit substructure searching jobs

csh ../scripts/submit.csh full_path_of_sub_pattern.smarts

7) Collect results

cat sub_searching_*/*.extract.output.smi > output.smi