Repackaging DB2 DOCK38: Difference between revisions
No edit summary |
No edit summary |
||
(5 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
# required parameter | # required parameter | ||
TARBALL_SOURCE=$1 | |||
TARBALL_REPACK_DEST=$2 | |||
TARBALL_SOURCE=$(realpath $TARBALL_SOURCE) | |||
TARBALL_REPACK_DEST=$(realpath $TARBALL_REPACK_DEST) | |||
[ -z $TARBALL_SOURCE ] && echo "need to provide TARBALL_SOURCE as 1st arg!" && exit 1 | |||
[ -z $TARBALL_REPACK_DEST ] && echo "need to provide TARBALL_REPACK_DEST as 2nd arg!" && exit 1 | |||
# optional parameters | # optional parameters | ||
Line 19: | Line 26: | ||
echo finding | echo finding | ||
find $ | find $TARBALL_SOURCE -name '*.tar.gz' > tarball_list.txt | ||
echo splitting | echo splitting | ||
split -l $PACKAGES_PER_PACKAGE tarball_list.txt tarball_split_list/ | split -l $PACKAGES_PER_PACKAGE tarball_list.txt tarball_split_list/ | ||
Line 26: | Line 33: | ||
for f in ../tarball_split_list/*; do | for f in ../tarball_split_list/*; do | ||
for tb in $(cat $f); do | for tb in $(cat $f); do | ||
! [ -z $VERBOSE ] && echo tar --transform='s/^.*\///' -xf $tb '*.'$PACKAGE_TYPE 2>/dev/null | |||
tar --transform='s/^.*\///' -xf $tb '*.'$PACKAGE_TYPE 2>/dev/null | tar --transform='s/^.*\///' -xf $tb '*.'$PACKAGE_TYPE 2>/dev/null | ||
done | done | ||
! [ -z $VERBOSE ] && echo tar -czf $(basename $f).$PACKAGE_TYPE.tar.gz '*.'$PACKAGE_TYPE | |||
tar -czf $(basename $f).$PACKAGE_TYPE_SHORT.tar.gz *.$PACKAGE_TYPE | tar -czf $(basename $f).$PACKAGE_TYPE_SHORT.tar.gz *.$PACKAGE_TYPE | ||
mv $(basename $f).$PACKAGE_TYPE_SHORT.tar.gz | mv $(basename $f).$PACKAGE_TYPE_SHORT.tar.gz $TARBALL_REPACK_DEST | ||
rm *.$PACKAGE_TYPE | rm *.$PACKAGE_TYPE | ||
echo $(basename $f) | echo $(basename $f) | ||
done | done | ||
cd .. | cd .. | ||
echo Done! Results in $ | rm -r $WORKING_DIRECTORY | ||
echo Done! Results in $TARBALL_REPACK_DEST</nowiki> | |||
Now, an example usage: | Now, an example usage: | ||
<nowiki> | <nowiki> | ||
[user@gimel5 ~] bash make_tarballs.bash | [user@gimel5 ~] bash make_tarballs.bash $PWD/H17P200_H19P400.smi.batch-3d.d/out $PWD/tarballs_repacked/H17P200_H19P400 | ||
finding | finding | ||
splitting | splitting | ||
Line 53: | Line 63: | ||
ai | ai | ||
aj | aj | ||
Done! Results in / | Done! Results in $PWD/tarballs_repacked/H17P200_H19P400</nowiki> | ||
It should be noted that this script will be effective for fairly small batches of molecules, e.g on the range of millions, rather than billions of molecules. Talk to me (ben@tingle.org) or John Irwin for more information on how to repack Very Large ligand libraries. | |||
For docking from ligands built using our pipeline with default options, running this script unmodified is sufficient for creating appropriately sized packages for docking. You may wish to edit WORKING_DIRECTORY to /scratch or some other larger directory if running out of space on /tmp is a concern. The /tmp directory typically only holds around 50G of data, which may not be enough for some workloads or environments. | |||
[[Category:DOCK_3.8]] |
Latest revision as of 22:02, 27 March 2023
The following is a script for repackaging 3D pipeline results. First, here is the script:
#!/bin/bash # make_tarballs.bash # required parameter TARBALL_SOURCE=$1 TARBALL_REPACK_DEST=$2 TARBALL_SOURCE=$(realpath $TARBALL_SOURCE) TARBALL_REPACK_DEST=$(realpath $TARBALL_REPACK_DEST) [ -z $TARBALL_SOURCE ] && echo "need to provide TARBALL_SOURCE as 1st arg!" && exit 1 [ -z $TARBALL_REPACK_DEST ] && echo "need to provide TARBALL_REPACK_DEST as 2nd arg!" && exit 1 # optional parameters WORKING_DIRECTORY=${WORKING_DIRECTORY-/tmp/$(whoami)} PACKAGES_PER_PACKAGE=${PACKAGES_PER_PACKAGE-100} PACKAGE_TYPE=${PACKAGE_TYPE-db2.gz} PACKAGE_TYPE_SHORT=$(echo $PACKAGE_TYPE | cut -d'.' -f1) echo WORKING_DIRECTORY=$WORKING_DIRECTORY mkdir -p $WORKING_DIRECTORY && cd $WORKING_DIRECTORY mkdir -p output working tarball_split_list echo finding find $TARBALL_SOURCE -name '*.tar.gz' > tarball_list.txt echo splitting split -l $PACKAGES_PER_PACKAGE tarball_list.txt tarball_split_list/ echo working cd working for f in ../tarball_split_list/*; do for tb in $(cat $f); do ! [ -z $VERBOSE ] && echo tar --transform='s/^.*\///' -xf $tb '*.'$PACKAGE_TYPE 2>/dev/null tar --transform='s/^.*\///' -xf $tb '*.'$PACKAGE_TYPE 2>/dev/null done ! [ -z $VERBOSE ] && echo tar -czf $(basename $f).$PACKAGE_TYPE.tar.gz '*.'$PACKAGE_TYPE tar -czf $(basename $f).$PACKAGE_TYPE_SHORT.tar.gz *.$PACKAGE_TYPE mv $(basename $f).$PACKAGE_TYPE_SHORT.tar.gz $TARBALL_REPACK_DEST rm *.$PACKAGE_TYPE echo $(basename $f) done cd .. rm -r $WORKING_DIRECTORY echo Done! Results in $TARBALL_REPACK_DEST
Now, an example usage:
[user@gimel5 ~] bash make_tarballs.bash $PWD/H17P200_H19P400.smi.batch-3d.d/out $PWD/tarballs_repacked/H17P200_H19P400 finding splitting working aa ab ac ad ae af ag ah ai aj Done! Results in $PWD/tarballs_repacked/H17P200_H19P400
It should be noted that this script will be effective for fairly small batches of molecules, e.g on the range of millions, rather than billions of molecules. Talk to me (ben@tingle.org) or John Irwin for more information on how to repack Very Large ligand libraries.
For docking from ligands built using our pipeline with default options, running this script unmodified is sufficient for creating appropriately sized packages for docking. You may wish to edit WORKING_DIRECTORY to /scratch or some other larger directory if running out of space on /tmp is a concern. The /tmp directory typically only holds around 50G of data, which may not be enough for some workloads or environments.