ELC: Difference between revisions
No edit summary |
No edit summary |
||
Line 2: | Line 2: | ||
Eddie's Legacy Code (ELC) consists an RMSD script, a rank script, several scripts to process DockBlaster's job folders, [[Eplop|a PLOP wrapper]], and [[MPose|a script to run single mode DOCK]]. | Eddie's Legacy Code (ELC) consists an RMSD script, a rank script, several scripts to process DockBlaster's job folders, [[Eplop|a PLOP wrapper]], and [[MPose|a script to run single mode DOCK]]. | ||
{{TOCright}} | |||
==RMSD Script== | ==RMSD Script== |
Latest revision as of 12:37, 21 March 2014
Overview
Eddie's Legacy Code (ELC) consists an RMSD script, a rank script, several scripts to process DockBlaster's job folders, a PLOP wrapper, and a script to run single mode DOCK.
RMSD Script
The RMSD script is a wrapper around the FORTRAN program rmsd
. When the FORTRAN program accepts only PDB input with atoms ordered properly across the two inputs, the wrapper performs necessary format transform using OEChem and makes sure the ordering of atoms is correct.
To run the script, invoke it with two molecule files in whatever format that is accepted by OEChem:
~ycao/eplop/rmsd molecule1.mol2 molecule2.sdf
The first argument can be a multi-molecule MOL2 file. In this case, each entry will be compared to the molecule referenced by the second argument.
Note:
- This script cannot detect problmes. For example, it is possible that if the molecules are not valid, a result will still be printed and no error message woudl appear.
- This script utilizes
convert.py
andfile2file.py
to perform format conversion. Make sure you have set up the PATH correctly if you encounter problem running this script.
Rank Script
The rank script takes two (or four, or six, or any even number) of score files, and calculate rank of the molecule(s) in the first file when compared against those in the second file.
The score file is a simple space- or tab-delimited file, whose first column is the compound ID, and the following coumns are scores. For example:
C02953194 -8.610 1.379 -33.057 7.692 C13059374 -8.129 0.155 -28.957 -9.227 C09926947 -9.664 -0.753 -32.827 -21.917 C13043202 -6.366 1.365 -25.151 -10.238 C08517892 -0.529 4.565 -10.501 22.875 C13218443 0.920 4.319 -4.919 -2.689 C13305090 0.702 5.530 -7.720 -0.875 C13305085 -6.405 2.963 -27.119 -3.966
Such a format is used by score.py
, the rescoring wrapper, and eplop
, the PLOP wrapper.
As an example, to rank the ligand against the decoys using rescoring scores, you can run
cd /raid0/tmp/dockgui/06/24676/ python ~ycao/mpose/rank.py run.ligand/1.A/rescore.out run.decoys/1.A/rescore.out
This will compare ligand in the first file against decoys in the second file. Because there is only one row (and therefore one ligand) in the first score file, the output will consist of one row; because the score files consists of four scores, the output will have four ranks per row. In this example, the output is:
0.0198 0.0198 0.0198 0.0198
When the same molecule ID appears multiple times in the first score file, only the best score (being the smallest in value) will be considered. This is useful when you have multiple poses and therefore multiple scores for one molecule.
When scores of different types are stored in separate files, you are allowed to put more score files. For example, in the case of DockBlaster, DOCK scores are in a.energy
, and rescoring results are in rescore.out
, then you can run the following:
cd /raid0/tmp/dockgui/06/24676/ python ~/mpose/rank.py run.ligand/1.A/a.energy run.ligand/1.A/rescore.out run.decoys/1.A/a.energy run.decoys/1.A/rescore.out
The script will do a join by compound name before calculating the ranks. The above will give you an output of one row (corresponding to one ligand) and five columns (corresponding to five scores per molecule):
0.0000 0.0198 0.0198 0.0198 0.0198
Note that you must put the score files for the ligands before the score files for the decoys, and the number of score files for the ligands must be the same as that for decoys.
When scores of the same types are stored in separate files, you might want to perform concatenation before calculating ranks. Use the -c
option in this case.
Note: The definition of rank is rank / (number of decoys + number of ligands)
, where rank
starts with 0.
Scripts to Process DockBlaster Job Folders
The scripts are located under ~ycao/dockenv/etc
A few scripts have been written to apply the SIMD-style calculation to a series of DockBlaster job folders. SIMD stands for Single Instruction, Multiple Data. It means to apply the same processing logic to multiple data. In the context of processing DockBlaster job folders, it means to process all the folders using the same script or program, either sequentially or in parallel with the help of qsub.
The scripts below depends on the availability of an index file, which contains of a list of DockBlaster job IDs. Each line of the index file starts with one job ID, while the remaining content of the line will be ignored. Therefore, the MEMO generated by running auto_dock.csh works perfectly for this purpose.
foralljobs.py
This script takes an index file and a command, and will invoke the command in each of the jobs whose IDs are listed in the index file. For example:
foralljobs.py DockBl_ID_MEMO_astex pwd
will run pwd
in each of the job folders for my astex DockBlaster jobs.
The command to run, given as the second argument, can be more complicated. I used the following command to back up job folders:
foralljobs.py ~/work/astex2/list 'pwd; A=`pwd`; B=`basename $A`; cd ..; tar -cjf ~/data/astex2/$B.tar.bz2 $B --exclude $B/run.similar'
To process jobs in parallel, just invoke the processing script with qsub:
foralljobs.py DockBl_ID_MEMO_astex 'qsub myprocessing.csh'
foralldirs.py
Similar to foralljobs.py
, this script also takes an index file and a command. Instead of running the command in each job folder, it runs the command under each parametered run folder, such as run.decoys/1.A
. To demonstrate the use, and at the same time see what folders are traversed, run the following command:
foralljobs.py DockBl_ID_MEMO_astex pwd
As a concrete example, you can perform rescoring with:
foralljobs.py DockBl_ID_MEMO_astex 'score.py a.new.mol2 > rescore.out'
This script allows you to consider only the run.ligand/*
folders:
foralljobs.py --ligand DockBl_ID_MEMO_astex pwd
or the run.decoys/*
folders only:
foralljobs.py --decoys DockBl_ID_MEMO_astex pwd
As a concrete example, you can perform enrichment analysis using rescoring result by first create the following simple script named /tmp/process.sh
:
DIR=`pwd` DIRN=`basename $DIR` SISTER=../../run.decoys/$DIRN ~/mpose/rank.py rescore.out $SISTER/rescore.out
and then run
foralljobs.py --ligand DockBl_ID_MEMO_astex 'bash /tmp/process.sh'
In case of DUD multi-ligands-multi-decoys experiments, two extra run folders are created, namely run.dudact
and run.duddec
. This script can also traverse them:
foralljobs.py --dud DockBl_ID_MEMO_astex pwd
NOTE: This script will check each traversed folder, and make sure DOCK has finished in that folder. Specifically, it makes sure the OUTDOCK file ends with a line starting with elapsed time (sec):
.
eplop
eplop, or Easy PLOP, is a wrapper script for PLOP.
MPose
MPose, or Multiple Pose, is a script to run single mode DOCK.