Large-scale SMILES Requesting and Fingerprints Converting: Difference between revisions

From DISI
Jump to navigation Jump to search
(Created page with "Written by Jiankun Lyu, 20170918 The hierarchy of the directories: smiles_requesting/----- working/ | | | |----...")
 
No edit summary
Line 41: Line 41:
  cd ../
  cd ../


2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file
2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file and split the zincid file
  cd working/db_zincid
  cd working/db_zincid
  head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets
  head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets
Line 60: Line 60:
  if you find any missing files, please edit dirlist in the working directory and resubmit them.
  if you find any missing files, please edit dirlist in the working directory and resubmit them.


6) Collect data from the compressed files
7) Collect data from the compressed files
  cd db_zincid
  cd db_zincid
  python ../../scripts/combine_smi_and_fp.py
  python ../../scripts/combine_smi_and_fp.py

Revision as of 23:46, 18 September 2017

Written by Jiankun Lyu, 20170918

The hierarchy of the directories:

smiles_requesting/----- working/ 
              |                |
              |                |------ extract_all.sort.uniq.txt file(soft link)
              |                | 
              |                |------ db_zincid/
              |                                                 
              |                                                 
              |
              ------- scripts/ ------ submit.csh
                              |
                              |------ make_chunks_for_file_new.py
                              |
                              |------ setup_converting_fps_files.py
                              |
                              |------ combine_smi_and_fp.py
                              |
                              |------ check_outputs.csh


This tutorial is for requesting a large number of SMILES for docking results from ZINC server. Usually, the number is larger than 5M ZINC IDs.

1) make directories and copy scripts

mkdir smiles_requesting
cd smiles_requesting
mkdir working
mkdir scripts
cd working
mkdir db_zincid
ln -s /path/to/extract_all.sort.uniq.txt
cd ../scripts
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/submit.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/setup_converting_fps_files.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/combine_smi_and_fp.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/check_outputs.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/make_chunks_for_file_new.py .
cd ../

2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file and split the zincid file

cd working/db_zincid
head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets
python ../../scripts/make_chunks_for_file_new.py extract_all.top(number).sort.uniq.zincid.energy top(number).zincid 500 .
cd ../

3) Create a zincid.sdi file

ls /full/path/to/db_zincid/top(number).zincid_* > zincid.sdi

4) Set up requesting files and directories

python ../scripts/setup_converting_fps_files.py . converting_fps_ zincid.sdi 500 count

5) Submit requesting and converting jobs

csh ../scripts/submit.csh

6) Check if every job finishes

csh ../scripts/check_outputs.csh (number_of_jobs) (prefix)
if you find any missing files, please edit dirlist in the working directory and resubmit them.

7) Collect data from the compressed files

cd db_zincid
python ../../scripts/combine_smi_and_fp.py