How to do parallel search of smi files on the cluster

From DISI
Jump to: navigation, search

This tutorial shows how to do parallel search of smi files on the cluster. The files and scripts can be found in /nfs/home/jizhou/ex7/2D/test-parallel @gimel.compbio.ucsf.edu. Indexing and parallel computing are used to speedup searching. The performance of qsub depends on the workload of the whole cluster. Generally, searching with qsub has good scalability.

Create a folder with the following files and scripts

SUBMIT.sh
input.txt
search_smi.sh
merge.sh

SUBMIT.sh

SUBMIT.sh contains bash code for qsub. SUBMIT.sh specify the qsub command, parameters for qsub, input file, the function script, parameters for the function. A example is shown below.

#!/bin/bash

/nfs/soft/tools/utils/qsub-slice/qsub-mr \                 #  The qsub command
    -l 5 \                                                 #  The number of lines to be handled by each task, here is 5
    -N test \                                              #  The name of the queue to submit to
    input.txt \                                            #  The input file names and directory
    ./search_smi.sh \                                      #  The searching function to be performed 
    -q "CS(=O)(=O)CCNCc1ccccc1"                            #  Parameter for search_smi.sh, the input query for searching


input.txt

The input file names and directory. An example of input.txt is shown below. You can use ls *.smi > input.txt to generate this file.

/nfs/home/jizhou/ex7/2D/CD/CDAA.smi
/nfs/home/jizhou/ex7/2D/CD/CDAB.smi
/nfs/home/jizhou/ex7/2D/CD/CDAC.smi
/nfs/home/jizhou/ex7/2D/CD/CDAD.smi
/nfs/home/jizhou/ex7/2D/CD/CDAE.smi
...


search_smi.sh

The searching function used by qsub. The core function of search_smi.sh is mol2img_trial which is located in "/nfs/home/jizhou/work/Projects/smi_index/dotmatics/". mol2img_trial generates index for the smi file to speedup searching. search_smi.sh requires an input query for searching. An example is shown below

-q "CS(=O)(=O)CCNCc1ccccc1"


run SUBMIT.sh

Run SUBMIT.sh to submit the job to cluster. The job will be run on the background. When it finishes, a new directory outputs will be created in current folder. The outputs will be stored in outputs/. You can use the following command to check qsub status, start or stop a job. For more information, please refer to qstat

qstat                         # check the status of jobs, example is shown below.

-bash-4.1$ qstat
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
6511305 1.25000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-5-29.cluster.ucsf.bksl     1 1
6511305 0.75000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-20.cluster.ucsf.bksl     1 2
6511305 0.58333 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-1-132.cluster.ucsf.bks     1 3
6511305 0.50000 test-map   jizhou       r     07/19/2018 10:42:43 all.q@n-9-21.cluster.ucsf.bksl     1 4


merge.sh

When all jobs are completed, run merge.sh to check the outputs. Sample outputs are shown below

CS(=O)(=O)CCNCc1ccncc1 ZINC000037491283|70.6
CS(=O)(=O)CCNCc1ccc(O)cc1 ZINC000037740328|70.6
CS(=O)(=O)CCNCCOc1ccccc1 ZINC000048777006|70.6
CS(=O)(=O)CCNCc1ccccc1 ZINC000037491280|100.0
CS(=O)(=O)CCNCCc1ccccc1 ZINC000037491281|75.0
...


Clean up

To clean up, run /nfs/soft/tools/utils/qsub-slice/qsub-mr --clean. The outputs directory and its files will be removed.

/nfs/soft/tools/utils/qsub-slice/qsub-mr --clean