How to do parallel search of smi files on the cluster: Difference between revisions
No edit summary |
No edit summary |
||
(12 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu | This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu. | ||
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability. | |||
'''Create a folder with the following files and scripts''' | '''Create a folder with the following files and scripts''' | ||
Line 33: | Line 34: | ||
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi | /nfs/home/jizhou/ex7/2D/CD/CDAD.smi | ||
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi | /nfs/home/jizhou/ex7/2D/CD/CDAE.smi | ||
... | ... | ||
</pre> | </pre> | ||
Line 48: | Line 48: | ||
'''run SUBMIT.sh''' | '''run SUBMIT.sh''' | ||
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. | Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat] | ||
<pre> | <pre> | ||
Line 60: | Line 60: | ||
6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3 | 6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3 | ||
6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4 | 6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4 | ||
</pre> | |||
'''merge.sh''' | |||
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below | |||
<pre> | |||
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6 | |||
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6 | |||
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6 | |||
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0 | |||
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0 | |||
... | |||
</pre> | |||
'''Clean up''' | |||
To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed. | |||
<pre> | |||
/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean | |||
</pre> | </pre> |
Latest revision as of 18:02, 19 July 2018
This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu. Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability.
Create a folder with the following files and scripts
SUBMIT.sh input.txt search_smi.sh merge.sh
SUBMIT.sh
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.
#!/bin/bash /nfs/soft/tools/utils/qsub-slice/qsub-mr \ # The qsub command -l 5 \ # The number of lines to be handled by each task, here is 5 -N test \ # The name of the queue to submit to input.txt \ # The input file names and directory ./search_smi.sh \ # The searching function to be performed -q "CS(=O)(=O)CCNCc1ccccc1" # Parameter for search_smi.sh, the input query for searching
input.txt
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi /nfs/home/jizhou/ex7/2D/CD/CDAB.smi /nfs/home/jizhou/ex7/2D/CD/CDAC.smi /nfs/home/jizhou/ex7/2D/CD/CDAD.smi /nfs/home/jizhou/ex7/2D/CD/CDAE.smi ...
search_smi.sh
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below
-q "CS(=O)(=O)CCNCc1ccccc1"
run SUBMIT.sh
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to qstat
qstat # check the status of jobs, example is shown below. -bash-4.1$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 6511305 1.25000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl 1 1 6511305 0.75000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl 1 2 6511305 0.58333 test-map jizhou r 07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks 1 3 6511305 0.50000 test-map jizhou r 07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl 1 4
merge.sh
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6 CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6 CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6 CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0 CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0 ...
Clean up
To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.
/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean