How to process results from a large-scale docking: Difference between revisions
Line 67: | Line 67: | ||
mv extract_all.sort.uniq.re.txt extract_all.topN.cluster.head.txt | mv extract_all.sort.uniq.re.txt extract_all.topN.cluster.head.txt | ||
Then, get poses of those cluster heads | Then, get poses of those cluster heads | ||
python /nfs/home/tbalius/zzz.scripts_from_reed/getposesfast.py / | python /nfs/home/tbalius/zzz.scripts_from_reed/getposesfast.py path/to/docking extract_all.topN.cluster.head.txt num_of_cluster_heads |
Revision as of 22:48, 24 October 2017
Written by Jiankun Lyu, 20171018
This tutorial uses the DOCK3.7.1rc1.
Check for completion and resubmit failed jobs
In a large calculation such as this one with tens of thousands of separate processes, it is not unusual for some jobs to fail for some reason. Checking for successful completion, and re-starting failed processes, is a normal part of running such a large job.
Only do this if dirlist is the origenal and dirlist_ori does not exist
cp dirlist dirlist_ori
Run the script below in your docking directory
csh /nfs/home/tbalius/zzz.github/DOCK/docking/submit/get_not_finished.csh /path/to/docking
For example:
csh /nfs/home/tbalius/zzz.github/DOCK/docking/submit/get_not_finished.csh ./
This script puts all the directories of failed jobs in a file called dirlist_new.
mv dirlist dirlist_old mv dirlist_new dirlist_new1 cp dirlist_new1 dirlist
Remove OUTDOCK, test.mol2.gz, and stderr from all of the directories that are incomplete.
foreach dir (`cat dirlist`) echo $dir rm -rf $dir/OUTDOCK $dir/test.mol2.gz $dir/stderr end
This is to prevent issue during rerun and confusion over if docking was rerun.
Then resubmit them
$DOCKBASE/docking/submit/submit.csh
Before processing the results with extract_all_blazing_fast: (1) you might want to re-check that all finish; (2) you should copy dirlist_ori back to dirlist.
cp dirlist_ori dirlist
Combine results
When docking is complete, merge the results of each separate docking job into a single sorted file.
cd path/to/docking python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt energy_cutoff
Cluster results
cd path/to/docking csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh N TC
Where: N is number of top molecules you want to cluster. E.g. 100,000 TC is tanimoto coefficient cutoff: e.g. 0.5 This will cluster the top 100,000 molecules from a docking run with TC cutoff 0.5. Read more here: http://wiki.docking.org/index.php/ECFP4_Best_First_Clustering
Get poses of cluster heads
First, creat a extract_all file just for cluster head
cd best_first_clustering_N awk -F"," '{print $3}' cluster_head.list > cluster_head.zincid python ~jklyu/zzz.script/large_scale_docking/DOCK/rerank_extract_file.py ../extract_all.topN.sort.uniq.txt cluster_head.zincid . mv extract_all.sort.uniq.re.txt extract_all.topN.cluster.head.txt
Then, get poses of those cluster heads
python /nfs/home/tbalius/zzz.scripts_from_reed/getposesfast.py path/to/docking extract_all.topN.cluster.head.txt num_of_cluster_heads