Large-scale SMILES requesting: Difference between revisions

From DISI
Jump to navigation Jump to search
(Created page with "Written by Jiankun Lyu, 20170918 This tutorial is for requesting a large number of SMILES from ZINC server. Usually, the number is larger than 5M ZINC IDs.")
 
No edit summary
 
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
Written by Jiankun Lyu, 20170918
Written by Jiankun Lyu, 20170918


This tutorial is for requesting a large number of SMILES from ZINC server. Usually, the number is larger than 5M ZINC IDs.
The hierarchy of the directories:
 
smiles_requesting/----- working/
              |                |
              |                |------ extract_all.sort.uniq.txt file(soft link)
              |                |
              |                |------ db_zincid/
              |                                               
              |                                               
              |
              ------- scripts/ ------ submit.csh
                              |
                              |------ make_chunks_for_file_new.py
                              |
                              |------ setup_converting_fps_files.py
                              |
                              |------ combine_smi_and_fp.py
                              |
                              |------ check_outputs.csh
 
 
This tutorial is for requesting a large number of SMILES for docking results from ZINC server. Usually, the number is larger than 5M ZINC IDs.
 
1) make directories and copy scripts
 
mkdir smiles_requesting
cd smiles_requesting
mkdir working
mkdir scripts
cd working
mkdir db_zincid
ln -s /path/to/extract_all.sort.uniq.txt
cd ../scripts
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/submit.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/setup_converting_fps_files.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/combine_smi_and_fp.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/check_outputs.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/make_chunks_for_file_new.py .
cd ../
 
2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file
cd working/db_zincid
head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets
python ../../scripts/make_chunks_for_file_new.py extract_all.top(number).sort.uniq.zincid.energy top(number).zincid 500 .
cd ../
 
3) Create a zincid.sdi file
ls /full/path/to/db_zincid/top(number).zincid_* > zincid.sdi
 
4) Set up requesting files and directories
python ../scripts/setup_converting_fps_files.py . converting_fps_ zincid.sdi 500 count
 
5) Submit requesting and converting jobs
csh ../scripts/submit.csh
 
6) Check if every job finishes
csh ../scripts/check_outputs.csh (number_of_jobs) (prefix)
if you find any missing files, please edit dirlist in the working directory and resubmit them.
 
6) Collect data from the compressed files
cd db_zincid
python ../../scripts/combine_smi_and_fp.py

Latest revision as of 23:01, 18 September 2017

Written by Jiankun Lyu, 20170918

The hierarchy of the directories:

smiles_requesting/----- working/ 
              |                |
              |                |------ extract_all.sort.uniq.txt file(soft link)
              |                | 
              |                |------ db_zincid/
              |                                                 
              |                                                 
              |
              ------- scripts/ ------ submit.csh
                              |
                              |------ make_chunks_for_file_new.py
                              |
                              |------ setup_converting_fps_files.py
                              |
                              |------ combine_smi_and_fp.py
                              |
                              |------ check_outputs.csh


This tutorial is for requesting a large number of SMILES for docking results from ZINC server. Usually, the number is larger than 5M ZINC IDs.

1) make directories and copy scripts

mkdir smiles_requesting
cd smiles_requesting
mkdir working
mkdir scripts
cd working
mkdir db_zincid
ln -s /path/to/extract_all.sort.uniq.txt
cd ../scripts
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/submit.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/setup_converting_fps_files.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/combine_smi_and_fp.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/check_outputs.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/make_chunks_for_file_new.py .
cd ../

2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file

cd working/db_zincid
head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets
python ../../scripts/make_chunks_for_file_new.py extract_all.top(number).sort.uniq.zincid.energy top(number).zincid 500 .
cd ../

3) Create a zincid.sdi file

ls /full/path/to/db_zincid/top(number).zincid_* > zincid.sdi

4) Set up requesting files and directories

python ../scripts/setup_converting_fps_files.py . converting_fps_ zincid.sdi 500 count

5) Submit requesting and converting jobs

csh ../scripts/submit.csh

6) Check if every job finishes

csh ../scripts/check_outputs.csh (number_of_jobs) (prefix)
if you find any missing files, please edit dirlist in the working directory and resubmit them.

6) Collect data from the compressed files

cd db_zincid
python ../../scripts/combine_smi_and_fp.py