New 3D Building On Wynton: Difference between revisions

From DISI
Jump to navigation Jump to search
(Created page with " <nowiki> have re-vamped the script for the 2nd time, this time configured to use SGE on Wynton. the script is based in ~/zinc-3d-build-3 on jji@wynton I've had to re-install/...")
(No difference)

Revision as of 03:27, 16 July 2020

have re-vamped the script for the 2nd time, this time configured to use SGE on Wynton.
the script is based in ~/zinc-3d-build-3 on jji@wynton
I've had to re-install/reconfigure some of the software as it was not working properly on the wynton cluster
This software has been installed in various places in $HOME

The output of both the script results and the log files are organized in a similar fashion, which I will explain

There is one script of interest for running jobs, and this is submit-all-jobs.bash. This script takes in a source SMILES file and an output destination.
The script will then submit a number of jobs to build 3D ligand data and save results in an organized fashion to the output destination

Each job submitted by the script works on a batch of 100 substances. A group of 10,000 substances, or 100 jobs, is called a "batch"
Each 100 SMILES read in by the script is assigned a batch no. based on it's position in the source file

ex:

smiles | ZINC ID | line no. | batch no.
=======================================
CCAA   | ZINC000 | 0        | 0
...
CCZZ   | ZINCaaa | 10,000   | 1
...
CCXX   | ZINCbbb | 20,000   | 2
...
CCYY   | ZINCccc | 30,000   | 3


basically, BATCH_NO=LINE_NO/10000

Each job saves its results tarball to /wynton/scratch/jji/$SRC_FILENAME/$BATCH_ID/$END_ID.tar.gz
Each job saves its log stdout and stderr to /wynton/home/shoichetlab/jji/zinc-3d-build-3/logs/$SRC_FILENAME/$BATCH_ID/$END_ID.*
These directories can be re-configured by changing environment variables OUTPUT_DEST and LOG_BASE_DIR respectively prior to running the submit-all-jobs script

$SRC_FILENAME is the filename of the source file this group of jobs was run from
$BATCH_ID is the batch no. of the smiles
$END_ID is the line no. of the last substance in the job