Large-scale SMILES requesting: Difference between revisions
Jump to navigation
Jump to search
(Created page with "Written by Jiankun Lyu, 20170918 This tutorial is for requesting a large number of SMILES from ZINC server. Usually, the number is larger than 5M ZINC IDs.") |
No edit summary |
||
(7 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
Written by Jiankun Lyu, 20170918 | Written by Jiankun Lyu, 20170918 | ||
This tutorial is for requesting a large number of SMILES from ZINC server. Usually, the number is larger than 5M ZINC IDs. | The hierarchy of the directories: | ||
smiles_requesting/----- working/ | |||
| | | |||
| |------ extract_all.sort.uniq.txt file(soft link) | |||
| | | |||
| |------ db_zincid/ | |||
| | |||
| | |||
| | |||
------- scripts/ ------ submit.csh | |||
| | |||
|------ make_chunks_for_file_new.py | |||
| | |||
|------ setup_converting_fps_files.py | |||
| | |||
|------ combine_smi_and_fp.py | |||
| | |||
|------ check_outputs.csh | |||
This tutorial is for requesting a large number of SMILES for docking results from ZINC server. Usually, the number is larger than 5M ZINC IDs. | |||
1) make directories and copy scripts | |||
mkdir smiles_requesting | |||
cd smiles_requesting | |||
mkdir working | |||
mkdir scripts | |||
cd working | |||
mkdir db_zincid | |||
ln -s /path/to/extract_all.sort.uniq.txt | |||
cd ../scripts | |||
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/submit.csh . | |||
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/setup_converting_fps_files.py . | |||
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/combine_smi_and_fp.py . | |||
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/check_outputs.csh . | |||
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/make_chunks_for_file_new.py . | |||
cd ../ | |||
2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file | |||
cd working/db_zincid | |||
head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets | |||
python ../../scripts/make_chunks_for_file_new.py extract_all.top(number).sort.uniq.zincid.energy top(number).zincid 500 . | |||
cd ../ | |||
3) Create a zincid.sdi file | |||
ls /full/path/to/db_zincid/top(number).zincid_* > zincid.sdi | |||
4) Set up requesting files and directories | |||
python ../scripts/setup_converting_fps_files.py . converting_fps_ zincid.sdi 500 count | |||
5) Submit requesting and converting jobs | |||
csh ../scripts/submit.csh | |||
6) Check if every job finishes | |||
csh ../scripts/check_outputs.csh (number_of_jobs) (prefix) | |||
if you find any missing files, please edit dirlist in the working directory and resubmit them. | |||
6) Collect data from the compressed files | |||
cd db_zincid | |||
python ../../scripts/combine_smi_and_fp.py |
Latest revision as of 23:01, 18 September 2017
Written by Jiankun Lyu, 20170918
The hierarchy of the directories:
smiles_requesting/----- working/ | | | |------ extract_all.sort.uniq.txt file(soft link) | | | |------ db_zincid/ | | | ------- scripts/ ------ submit.csh | |------ make_chunks_for_file_new.py | |------ setup_converting_fps_files.py | |------ combine_smi_and_fp.py | |------ check_outputs.csh
This tutorial is for requesting a large number of SMILES for docking results from ZINC server. Usually, the number is larger than 5M ZINC IDs.
1) make directories and copy scripts
mkdir smiles_requesting cd smiles_requesting mkdir working mkdir scripts cd working mkdir db_zincid ln -s /path/to/extract_all.sort.uniq.txt cd ../scripts cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/submit.csh . cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/setup_converting_fps_files.py . cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/combine_smi_and_fp.py . cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/check_outputs.csh . cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/make_chunks_for_file_new.py . cd ../
2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file
cd working/db_zincid head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets python ../../scripts/make_chunks_for_file_new.py extract_all.top(number).sort.uniq.zincid.energy top(number).zincid 500 . cd ../
3) Create a zincid.sdi file
ls /full/path/to/db_zincid/top(number).zincid_* > zincid.sdi
4) Set up requesting files and directories
python ../scripts/setup_converting_fps_files.py . converting_fps_ zincid.sdi 500 count
5) Submit requesting and converting jobs
csh ../scripts/submit.csh
6) Check if every job finishes
csh ../scripts/check_outputs.csh (number_of_jobs) (prefix) if you find any missing files, please edit dirlist in the working directory and resubmit them.
6) Collect data from the compressed files
cd db_zincid python ../../scripts/combine_smi_and_fp.py