How to do parallel search of smi files on the cluster

Revision as of 17:57, 19 July 2018 by Jizhou (talk | contribs)
Jump to navigation Jump to search

This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel

Create a folder with the following files and scripts
input.txt contains bash code for qsub. specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.


.../qsub-mr \                                              #  The qsub command
    -l 5 \                                                 #  The number of lines to be handled by each task, here is 5
    -N test \                                              #  The name of the queue to submit to
    input.txt \                                            #  The input file names and directory
    ./ \                                      #  The searching function to be performed 
    -q "CS(=O)(=O)CCNCc1ccccc1"                            #  Parameter for, the input query for searching


The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.


The searching function used by qsub. The core function of is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. requires an input query for searching. An example is shown below

-q "CS(=O)(=O)CCNCc1ccccc1"


Run to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to qstat

qstat                         # check the status of jobs, example is shown below.

-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
6511305 1.25000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl     1 1
6511305 0.75000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl     1 2
6511305 0.58333 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks     1 3
6511305 0.50000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl     1 4

When all jobs are completed, run to check the outputs. Sample outputs are shown below

CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0

Clean up

To clean up, run .../qsub-mr --clean. The outputs directory and its files will be removed.

.../qsub-mr --clean