Large-scale SMILES Requesting and Fingerprints Converting: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 56: Line 56:
  csh ../scripts/submit.csh
  csh ../scripts/submit.csh


6) Check if every job finishes
6) Check if every job finishe
  csh ../scripts/check_outputs.csh (number_of_jobs) (prefix)
cd db_zincid
  csh ../../scripts/check_outputs.csh (number_of_jobs) (prefix)
  if you find any missing files, please edit dirlist in the working directory and resubmit them.
  if you find any missing files, please edit dirlist in the working directory and resubmit them.



Revision as of 00:13, 19 September 2017

Written by Jiankun Lyu, 20170918

The hierarchy of the directories:

smiles_requesting/----- working/ 
              |                |
              |                |------ extract_all.sort.uniq.txt file(soft link)
              |                | 
              |                |------ db_zincid/
              |                                                 
              |                                                 
              |
              ------- scripts/ ------ submit.csh
                              |
                              |------ make_chunks_for_file_new.py
                              |
                              |------ setup_converting_fps_files.py
                              |
                              |------ combine_smi_and_fp.py
                              |
                              |------ check_outputs.csh


This tutorial is for requesting a large number of SMILES for docking results from ZINC server. Usually, the number is larger than 5M ZINC IDs.

1) make directories and copy scripts

mkdir smiles_requesting
cd smiles_requesting
mkdir working
mkdir scripts
cd working
mkdir db_zincid
ln -s /path/to/extract_all.sort.uniq.txt
cd ../scripts
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/submit.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/setup_converting_fps_files.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/combine_smi_and_fp.py .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/check_outputs.csh .
cp /mnt/nfs/home/jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering/converting_fps/make_chunks_for_file_new.py .
cd ../

2) Get ZINC ID and energy columns from the extract_all.sort.uniq.txt file and split the zincid file

cd working/db_zincid
head -(number) ../extract_all.sort.uniq.txt | awk '{print $3" "$22}' > extract_all.top(number).sort.uniq.zincid.energy note: change number in the brackets
python ../../scripts/make_chunks_for_file_new.py extract_all.top(number).sort.uniq.zincid.energy top(number).zincid 500 .
cd ../

3) Create a zincid.sdi file

ls /full/path/to/db_zincid/top(number)_*.zincid > zincid.sdi

4) Set up requesting files and directories

python ../scripts/setup_converting_fps_files.py . converting_fps_ zincid.sdi 500 count

5) Submit requesting and converting jobs

csh ../scripts/submit.csh

6) Check if every job finishe

cd db_zincid
csh ../../scripts/check_outputs.csh (number_of_jobs) (prefix)
if you find any missing files, please edit dirlist in the working directory and resubmit them.

7) Collect data from the compressed files

cd db_zincid
python ../../scripts/combine_smi_and_fp.py