How to do parallel search of smi files on the cluster: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
 
(28 intermediate revisions by the same user not shown)
Line 1: Line 1:
This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu
This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu.
Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability.


'''Create a folder with the following files and scripts'''
'''Create a folder with the following files and scripts'''
Line 10: Line 11:


'''SUBMIT.sh'''
'''SUBMIT.sh'''
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.
SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.
<pre>
<pre>
#!/bin/bash
#!/bin/bash


/nfs/soft/tools/utils/qsub-slice/qsub-mr \
/nfs/soft/tools/utils/qsub-slice/qsub-mr \                 #  The qsub command
     -l 5 \
     -l 5 \                                                 #  The number of lines to be handled by each task, here is 5
     -N test \
     -N test \                                             #  The name of the queue to submit to
     all.input \
     input.txt \                                            #  The input file names and directory
     ./search_smi.sh \
     ./search_smi.sh \                                     #  The searching function to be performed
     -q "CS(=O)(=O)CCNCc1ccccc1"
     -q "CS(=O)(=O)CCNCc1ccccc1"                           #  Parameter for search_smi.sh, the input query for searching
</pre>
 
 
'''input.txt'''
 
The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.
<pre>
/nfs/home/jizhou/ex7/2D/CD/CDAA.smi
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi
...
</pre>
 
 
'''search_smi.sh'''
 
The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below
<pre>
-q "CS(=O)(=O)CCNCc1ccccc1"
</pre>
 
 
'''run SUBMIT.sh'''
 
Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to [http://web.mit.edu/longjobs/www/status.html qstat]
 
<pre>
qstat                        # check the status of jobs, example is shown below.
 
-bash-4.1$ qstat
job-ID  prior  name      user        state submit/start at    queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6511305 1.25000 test-map  jizhou      r    07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl    1 1
6511305 0.75000 test-map  jizhou      r    07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl    1 2
6511305 0.58333 test-map  jizhou      r    07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks    1 3
6511305 0.50000 test-map  jizhou      r    07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl    1 4
</pre>
 
 
'''merge.sh'''
 
When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below
<pre>
CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0
...
</pre>
 
 
'''Clean up'''
 
To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.
<pre>
/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean
</pre>
</pre>

Latest revision as of 18:02, 19 July 2018

This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu. Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability.

Create a folder with the following files and scripts

SUBMIT.sh
input.txt
search_smi.sh
merge.sh

SUBMIT.sh

SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.

#!/bin/bash

/nfs/soft/tools/utils/qsub-slice/qsub-mr \                 #  The qsub command
    -l 5 \                                                 #  The number of lines to be handled by each task, here is 5
    -N test \                                              #  The name of the queue to submit to
    input.txt \                                            #  The input file names and directory
    ./search_smi.sh \                                      #  The searching function to be performed 
    -q "CS(=O)(=O)CCNCc1ccccc1"                            #  Parameter for search_smi.sh, the input query for searching


input.txt

The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.

/nfs/home/jizhou/ex7/2D/CD/CDAA.smi
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi
...


search_smi.sh

The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below

-q "CS(=O)(=O)CCNCc1ccccc1"


run SUBMIT.sh

Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to qstat

qstat                         # check the status of jobs, example is shown below.

-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
6511305 1.25000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl     1 1
6511305 0.75000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl     1 2
6511305 0.58333 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks     1 3
6511305 0.50000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl     1 4


merge.sh

When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below

CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0
...


Clean up

To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.

/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean