How to process results from a large-scale docking: Difference between revisions

From DISI
Jump to navigation Jump to search
 
(6 intermediate revisions by 2 users not shown)
Line 10: Line 10:


  cp dirlist dirlist_ori
  cp dirlist dirlist_ori
chmod -w dirlist_ori
By changing permissions of dirlist_ori to be unwritable you are protected from over writing the file.


Run the script below in your docking directory
Run the script below in your docking directory
Line 20: Line 23:
This script puts all the directories of failed jobs in a file called dirlist_new.
This script puts all the directories of failed jobs in a file called dirlist_new.


  mv dirlist dirlist_old
  mv dirlist dirlist_old1
  mv dirlist_new dirlist_new1
  mv dirlist_new dirlist_new1
  cp dirlist_new1 dirlist
  cp dirlist_new1 dirlist
You may need to do this again this again, if so add one to the number at the end of filenames.


Remove OUTDOCK, test.mol2.gz, and stderr from all of the directories that are incomplete.   
Remove OUTDOCK, test.mol2.gz, and stderr from all of the directories that are incomplete.   
Line 47: Line 52:
  cd path/to/docking
  cd path/to/docking
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt energy_cutoff
  python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt energy_cutoff
for example:
python $DOCKBASE/analysis/extract_all_blazing_fast.py ./dirlist extract_all.txt 1000.0
NB: make sure you've got enough space in /tmp. Otherwise, setenv TMPDIR "/scratch" (csh)


== Cluster results ==
== Cluster results ==


  cd path/to/docking
  cd path/to/docking
  csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering.csh N TC
mkdir /scratch/yourlogin
  csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering_new.csh N TC


Where:
Where:
Line 59: Line 70:
Read more here:
Read more here:
http://wiki.docking.org/index.php/ECFP4_Best_First_Clustering
http://wiki.docking.org/index.php/ECFP4_Best_First_Clustering
for example:
mkdir /scratch/tbalius
csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering_new.csh 300000 0.5


== Get poses of cluster heads ==
== Get poses of cluster heads ==

Latest revision as of 22:39, 26 March 2019

Written by Jiankun Lyu, 20171018

This tutorial uses the DOCK3.7.1rc1.

Check for completion and resubmit failed jobs

In a large calculation such as this one with tens of thousands of separate processes, it is not unusual for some jobs to fail for some reason. Checking for successful completion, and re-starting failed processes, is a normal part of running such a large job.

Only do this if dirlist is the origenal and dirlist_ori does not exist

cp dirlist dirlist_ori
chmod -w dirlist_ori

By changing permissions of dirlist_ori to be unwritable you are protected from over writing the file.

Run the script below in your docking directory

csh /nfs/home/tbalius/zzz.github/DOCK/docking/submit/get_not_finished.csh /path/to/docking

For example:

csh /nfs/home/tbalius/zzz.github/DOCK/docking/submit/get_not_finished.csh ./

This script puts all the directories of failed jobs in a file called dirlist_new.

mv dirlist dirlist_old1
mv dirlist_new dirlist_new1
cp dirlist_new1 dirlist

You may need to do this again this again, if so add one to the number at the end of filenames.

Remove OUTDOCK, test.mol2.gz, and stderr from all of the directories that are incomplete.

foreach dir (`cat dirlist`)
  echo $dir
  rm -rf $dir/OUTDOCK $dir/test.mol2.gz $dir/stderr
end

This is to prevent issue during rerun and confusion over if docking was rerun.

Then resubmit them

$DOCKBASE/docking/submit/submit.csh

Before processing the results with extract_all_blazing_fast: (1) you might want to re-check that all finish; (2) you should copy dirlist_ori back to dirlist.

cp dirlist_ori dirlist

Combine results

When docking is complete, merge the results of each separate docking job into a single sorted file.

cd path/to/docking
python $DOCKBASE/analysis/extract_all_blazing_fast.py dirlist extract_all.txt energy_cutoff

for example:

python $DOCKBASE/analysis/extract_all_blazing_fast.py ./dirlist extract_all.txt 1000.0

NB: make sure you've got enough space in /tmp. Otherwise, setenv TMPDIR "/scratch" (csh)

Cluster results

cd path/to/docking
mkdir /scratch/yourlogin
csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering_new.csh N TC

Where: N is number of top molecules you want to cluster. E.g. 100,000 TC is tanimoto coefficient cutoff: e.g. 0.5 This will cluster the top 100,000 molecules from a docking run with TC cutoff 0.5. Read more here: http://wiki.docking.org/index.php/ECFP4_Best_First_Clustering

for example:

mkdir /scratch/tbalius
csh ~jklyu/zzz.script/large_scale_docking/cluster_analysis/best_first_clustering_new.csh 300000 0.5

Get poses of cluster heads

First, creat a extract_all file just for cluster head

cd best_first_clustering_N
awk -F"," '{print $3}' cluster_head.list > cluster_head.zincid
python ~jklyu/zzz.script/large_scale_docking/DOCK/rerank_extract_file.py ../extract_all.topN.sort.uniq.txt cluster_head.zincid .
mv extract_all.sort.uniq.re.txt extract_all.topN.cluster.head.txt

Then, get poses of those cluster heads

python /nfs/home/tbalius/zzz.scripts_from_reed/getposesfast.py path/to/docking extract_all.topN.cluster.head.txt num_of_cluster_heads