Another getposes.py

From DISI
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

10/5/2020 Ying

Script to run getposes.py strainfilter.py interfilter.py in parallel (on wynton):

 cd <path chunk folders from LSD>
 cp ~yingyang/scripts/getposes_inter_strain.csh .

Edit the getposes_inter_strain.csh file to change the input to interfilter.py (http://wiki.bkslab.org/index.php/Interaction_Filtering):

- line 67: change to key residue

- line 68: change to path to rec.crg.pdb

Finally, run the script:

 csh getposes_inter_strain.csh <absolute path to extract_all.sort.uniq.txt>


5/8/2020 Ying

Getting more than one pose...

Example of getting 3 poses for the top scored 6k molecules:

 /nfs/home/yingyang/programs/miniconda3/envs/teachopencadd/bin/python \
 /nfs/home/yingyangg/scripts/get_poses_multi.py -s extract_all.sort.uniq.txt -n 6000 -p 3 -o pose_top6k_x3.mol2


4/20/2020 Ying

Directly call python also works...

 /nfs/home/yingyang/programs/miniconda3/envs/teachopencadd/bin/python \
 /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 6000 -o pose_top6k.mol2


3/25/2020 Ying

Poses are needed for Shuo's interaction filter and strain filter, sometimes we need to get poses pre-clustering. Owing to the need, here's another get_poses.py script modified on top of getposes_blazing_faster.py from Reed & Trent.

The idea is that we only want to get one pose per zincid with the best dock score. So the script read extract_all.sort.uniq.txt file, and store the min_score for each zincid. When processing mol2.gz file, check if this molecule's mol2 with zincid matches the min_score, otherwise, skip to the next molecule.

First, set environment variable

source /nfs/home/yingyang/.cshrc_opencadd

Get help information:

python /nfs/home/yingyang/scripts/get_poses.py -h
usage: get_poses.py [-h] [-d DIR] [-s SCORE] [-n NUM] [-f FILE] [-o OUT]
                    [-z GZ_FILE]
optional arguments:
 -h, --help  show this help message and exit
 -d DIR      path to where docking is located (default: )
 -s SCORE    path to where the extract all file is (default:
             extract_all.sort.uniq.txt)
 -n NUM      number of molecules (poses) to get. (default: 500)
 -f FILE     file contained ligand names to extract (default: None)
 -o OUT      file name for poses (default: poses.mol2)
 -z GZ_FILE  file name for input (default: test.mol2.gz)

Example 1, get top 6k molecules from extract_all.sort.uniq.txt (in the docking directory). (getposes routine)

 python /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 6000 -o poses_top6k.mol2

Example 2, only get molecules with names listed in a file (for example, zincids of cluster heads), and cut at top 100k.

 python /nfs/home/yingyangg/scripts/get_poses.py -s extract_all.sort.uniq.txt -n 100000 -f <zincid.txt> -o poses_interested.mol2



Comparing the computation time:

Runtime getposes.png