Docking Analysis in DOCK3.8: Difference between revisions
(→Usage) |
|||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Location of new scripts/Install Instructions == | == Location of new scripts/Install Instructions == | ||
You can retrieve these scripts from the | You can retrieve these scripts from the "docktop" repository on github, which is public. | ||
<nowiki>git clone https://github.com/docking-org/docktop.git</nowiki> | |||
= Python 3.8+ = | |||
== Conda Environment == | |||
The simplest way to source python 3.8+ is to just install via conda. | |||
<nowiki> | <nowiki> | ||
conda create -n py311 python==3.11 | |||
conda activate py311</nowiki> | |||
No other packages are required! | |||
=== | == Manual Install == | ||
<b>On Wynton you can use the version installed @ /wynton/group/bks/soft/python-versions/python-3.8-install</b> | <b>On Wynton you can use the version installed @ /wynton/group/bks/soft/python-versions/python-3.8-install</b> | ||
Line 35: | Line 40: | ||
# rm Python-3.8.8.tgz</nowiki> | # rm Python-3.8.8.tgz</nowiki> | ||
= | = top_poses.py = | ||
== Description == | |||
Main pose retrieval algorithm, runs on multiple processes. | Main pose retrieval algorithm, runs on multiple processes. | ||
Line 61: | Line 64: | ||
Output prefix is where the top N poses will be written out when the script has finished. e.g /scratch/top_poses.mol2.gz, as well as a human-readable .scores file. | Output prefix is where the top N poses will be written out when the script has finished. e.g /scratch/top_poses.mol2.gz, as well as a human-readable .scores file. | ||
== Usage == | |||
<nowiki> | <nowiki> | ||
usage: top_poses.py [-h] [-n NPOSES] [-o OUTPREFIX] [-j NPROCESSES] [--id-file INPUT_ID_FILE] [--verbose] [--quiet] [--log-interval LOG_INTERVAL] dockresults_path | usage: top_poses.py [-h] [-n NPOSES] [-o OUTPREFIX] [-j NPROCESSES] [--id-file INPUT_ID_FILE] | ||
[--verbose] [--quiet] [--log-interval LOG_INTERVAL] | |||
[--find-min-size FIND_MIN_SIZE] | |||
dockresults_path | |||
Retrieve the top N poses from docking results | Retrieve the top N poses from docking results | ||
positional arguments: | positional arguments: | ||
dockresults_path Can be either a directory containing docking results, or a file where each line points to a docking results file. | dockresults_path Can be either a directory containing docking results, or a file where each | ||
line points to a docking results file. | |||
optional arguments: | optional arguments: | ||
-h, --help show this help message and exit | -h, --help show this help message and exit | ||
-n NPOSES How many top poses to retrieve, default of 150000 | -n NPOSES How many top poses to retrieve, default of 150000 | ||
-o OUTPREFIX Output file prefix. Each run will produce two files, a mol2.gz containing pose data, and a .scores file containing relevant score information. Default is "top_poses" | -o OUTPREFIX Output file prefix. Each run will produce two files, a mol2.gz containing | ||
-j NPROCESSES How many processes should be dedicated to this run, default is 2. If your files are spread across multiple disks, increasing this number will improve performance. | pose data, and a .scores file containing relevant score information. | ||
Default is "top_poses" | |||
-j NPROCESSES How many processes should be dedicated to this run, default is 2. If your | |||
files are spread across multiple disks, increasing this number will | |||
improve performance. | |||
--id-file INPUT_ID_FILE | --id-file INPUT_ID_FILE | ||
Only retrieve poses matching ids specified in an external file. | Only retrieve poses matching ids specified in an external file. | ||
Line 82: | Line 93: | ||
--log-interval LOG_INTERVAL | --log-interval LOG_INTERVAL | ||
number of poses between log statements. Ignored if --quiet enabled | number of poses between log statements. Ignored if --quiet enabled | ||
--find-min-size FIND_MIN_SIZE | |||
filter out test.mol2.gz* files below a minimum bytes size | |||
</nowiki> | </nowiki> | ||
== Note on Parallel Processing == | |||
By default, this script allocates two extra threads (-j 2) to read in files. This ensures that the main thread can sort poses uninterrupted, while the others take care of the grunt work of reading and annotating files. Increasing the number of reader threads beyond two does not guarantee an improvement in performance, but depending on the filesystem(s) your docking poses live on, they could. For example, on Wynton it can be helpful to allocate up to 8 extra threads for reading files, due to the way the filesystem works on Wynton. On the BKS cluster, increasing the number of reader threads beyond two will have a negligible (or even negative) impact, unless your files happen to be striped across multiple servers. | By default, this script allocates two extra threads (-j 2) to read in files. This ensures that the main thread can sort poses uninterrupted, while the others take care of the grunt work of reading and annotating files. Increasing the number of reader threads beyond two does not guarantee an improvement in performance, but depending on the filesystem(s) your docking poses live on, they could. For example, on Wynton it can be helpful to allocate up to 8 extra threads for reading files, due to the way the filesystem works on Wynton. On the BKS cluster, increasing the number of reader threads beyond two will have a negligible (or even negative) impact, unless your files happen to be striped across multiple servers. | ||
== Checking Logs == | == Checking Logs == | ||
If everything went smoothly, your log should end with a string of text that looks like this: | |||
If everything went smoothly, | |||
<nowiki> | <nowiki> | ||
Line 143: | Line 110: | ||
done processing! writing out... | done processing! writing out... | ||
299900 / 300000</nowiki> | 299900 / 300000</nowiki> | ||
You may also see a message that looks like this: | You may also see a message that looks like this: | ||
Line 154: | Line 117: | ||
This just indicates slowness in the file reading, and is common to see at the beginning of a log or when the filesystem is under high load. | This just indicates slowness in the file reading, and is common to see at the beginning of a log or when the filesystem is under high load. | ||
[[Category:DOCK 3.8]] | [[Category:DOCK 3.8]] |
Latest revision as of 04:34, 3 March 2023
Location of new scripts/Install Instructions
You can retrieve these scripts from the "docktop" repository on github, which is public.
git clone https://github.com/docking-org/docktop.git
Python 3.8+
Conda Environment
The simplest way to source python 3.8+ is to just install via conda.
conda create -n py311 python==3.11 conda activate py311
No other packages are required!
Manual Install
On Wynton you can use the version installed @ /wynton/group/bks/soft/python-versions/python-3.8-install
If you want to install python3.8 on your own, try the following:
wget https://www.python.org/ftp/python/3.8.8/Python-3.8.8.tgz # MY_SOFT is the directory you want to install to tar -C $MY_SOFT -xzf Python-3.8.8.tgz pushd $MY_SOFT/Python-3.8.8 ./configure --prefix=$MY_SOFT make && make install popd # add the new python 3.8 executable to your path to use export PATH=$PATH:$MY_SOFT/python-3.8-install/bin # optional: clean up the configuration files # rm -r $MY_SOFT/Python-3.8.8.tgz # rm Python-3.8.8.tgz
top_poses.py
Description
Main pose retrieval algorithm, runs on multiple processes.
Input can be a directory or a file. If input is a directory, the script will use a recursive find command to locate all test.mol2.gz* files residing in the directory structure.
If input is a file, each line in the file should map to a valid pose file, e.g:
/wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0000/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0001/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0002/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0003/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0004/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0005/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0006/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0007/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0008/test.mol2.gz /wynton/group/bks/work/yingyang/5HT-1d/04_LSD/run_dock_es1.5_ld0.3/docked_chunks/chunk0009/test.mol2.gz
Output prefix is where the top N poses will be written out when the script has finished. e.g /scratch/top_poses.mol2.gz, as well as a human-readable .scores file.
Usage
usage: top_poses.py [-h] [-n NPOSES] [-o OUTPREFIX] [-j NPROCESSES] [--id-file INPUT_ID_FILE] [--verbose] [--quiet] [--log-interval LOG_INTERVAL] [--find-min-size FIND_MIN_SIZE] dockresults_path Retrieve the top N poses from docking results positional arguments: dockresults_path Can be either a directory containing docking results, or a file where each line points to a docking results file. optional arguments: -h, --help show this help message and exit -n NPOSES How many top poses to retrieve, default of 150000 -o OUTPREFIX Output file prefix. Each run will produce two files, a mol2.gz containing pose data, and a .scores file containing relevant score information. Default is "top_poses" -j NPROCESSES How many processes should be dedicated to this run, default is 2. If your files are spread across multiple disks, increasing this number will improve performance. --id-file INPUT_ID_FILE Only retrieve poses matching ids specified in an external file. --verbose write verbose logs to stdout --quiet write minimum logs to stdout --log-interval LOG_INTERVAL number of poses between log statements. Ignored if --quiet enabled --find-min-size FIND_MIN_SIZE filter out test.mol2.gz* files below a minimum bytes size
Note on Parallel Processing
By default, this script allocates two extra threads (-j 2) to read in files. This ensures that the main thread can sort poses uninterrupted, while the others take care of the grunt work of reading and annotating files. Increasing the number of reader threads beyond two does not guarantee an improvement in performance, but depending on the filesystem(s) your docking poses live on, they could. For example, on Wynton it can be helpful to allocate up to 8 extra threads for reading files, due to the way the filesystem works on Wynton. On the BKS cluster, increasing the number of reader threads beyond two will have a negligible (or even negative) impact, unless your files happen to be striped across multiple servers.
Checking Logs
If everything went smoothly, your log should end with a string of text that looks like this:
received all input! joining threads... done processing! writing out... 299900 / 300000
You may also see a message that looks like this:
short timeout reached while retrieving pose... trying again! curr=...
This just indicates slowness in the file reading, and is common to see at the beginning of a log or when the filesystem is under high load.