Docking Analysis in DOCK3.8: Difference between revisions

From DISI
Jump to navigation Jump to search
(Created page with "== Location of new scripts/Install Instructions == /wynton/home/btingle/bin/top_poses All programs described are located on this directory for now. Github link soon. Note t...")
 
Line 66: Line 66:
Final output will show up in <staging directory>/output_final.poses.mol2.gz
Final output will show up in <staging directory>/output_final.poses.mol2.gz


Batch size refers to how many poses files will be evaluated by each job, the default is 1000, though you may want to modify this depending on the properties of your poses files.
Batch size refers to how many poses files will be evaluated by each job, the default is 1000, though you may want to modify this depending on the properties of your poses files/how many there are.


Only works on sge for right now. Tested on Wynton.
Only works on sge for right now. Tested on Wynton.
Line 74: Line 74:
  <nowiki>
  <nowiki>
run_top_poses_mr.bash <input> <staging directory> <<batch size>></nowiki>
run_top_poses_mr.bash <input> <staging directory> <<batch size>></nowiki>


== Checking Logs ==
== Checking Logs ==

Revision as of 02:36, 14 April 2021

Location of new scripts/Install Instructions

/wynton/home/btingle/bin/top_poses

All programs described are located on this directory for now. Github link soon.

Note the link to python3.8 in this directory. You need to include a link to a python3.8 executable in your personal bin directory. There are no pip requirements, just a blank python 3.8 install.

Scripts Description

top_poses.py

Description

Main pose retrieval algorithm, runs on multiple cores. 7 cores is recommended and also the default.

Input can be a directory or a file. If input is a directory, the script will use a find command to locate all test.mol2.gz* files residing in the directory structure.

If input is a file, each line in the file should map to a valid pose file, e.g:

/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0000/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0001/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0002/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0003/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0004/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0005/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0006/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0007/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0008/test.mol2.gz
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0009/test.mol2.gz

Output is where the top 300K poses will be written out when the script has finished. e.g /scratch/top_poses.mol2.gz

Usage

python3.8 top_poses.py <input> <output> <<ncores>>

run_top_poses.bash

Description

Wrapper script for top_poses.py, can be used to submit individual pose jobs. Will run with 7 cores allocated.

Usage

run_top_poses.bash <input> <output>

Typical qsub usage

qsub -wd $PWD run_top_poses.bash <input> <output>

run_top_poses_mr.bash

Description

Map-reduce script to submit a number of analysis jobs and combine their results. The preferred method of running large analysis workloads.

Input field is evaluated the same as the other scripts.

Staging directory should be an NFS directory writable by your user. This is where input/output will be stored by the script.

Final output will show up in <staging directory>/output_final.poses.mol2.gz

Batch size refers to how many poses files will be evaluated by each job, the default is 1000, though you may want to modify this depending on the properties of your poses files/how many there are.

Only works on sge for right now. Tested on Wynton.

Usage

run_top_poses_mr.bash <input> <staging directory> <<batch size>>

Checking Logs

After your jobs have finished, check the logs to see if anything went wrong. If everything went smoothly, there should be nothing in the .err logs, and each .out log should end with a string of text that looks like this:

received all input!
joining threads...
done processing! writing out...
299900 / 300000

If you find an output file that doesn't end like this, you may wish to re-attempt that particular job. All you need to do is re-run run_top_poses_mr.bash with the same parameters as before, the script detects existing output and will only re-submit as necessary.