New 3D Building On Wynton
Jump to navigation
Jump to search
have re-vamped the script for the 2nd time, this time configured to use SGE on Wynton. the script is based in ~/zinc-3d-build-3 on jji@wynton I've had to re-install/reconfigure some of the software as it was not working properly on the wynton cluster This software has been installed in various places in $HOME The output of both the script results and the log files are organized in a similar fashion, which I will explain There is one script of interest for running jobs, and this is submit-all-jobs.bash. This script takes in a source SMILES file and an output destination. The script will then submit a number of jobs to build 3D ligand data and save results in an organized fashion to the output destination Each job submitted by the script works on a batch of 100 substances. A group of 10,000 substances, or 100 jobs, is called a "batch" Each 100 SMILES read in by the script is assigned a batch no. based on it's position in the source file ex: smiles | ZINC ID | line no. | batch no. ======================================= CCAA | ZINC000 | 0 | 0 ... CCZZ | ZINCaaa | 10,000 | 1 ... CCXX | ZINCbbb | 20,000 | 2 ... CCYY | ZINCccc | 30,000 | 3 basically, BATCH_NO=LINE_NO/10000 Each job saves its results tarball to /wynton/scratch/jji/$SRC_FILENAME/$BATCH_ID/$END_ID.tar.gz Each job saves its log stdout and stderr to /wynton/home/shoichetlab/jji/zinc-3d-build-3/logs/$SRC_FILENAME/$BATCH_ID/$END_ID.* These directories can be re-configured by changing environment variables OUTPUT_DEST and LOG_BASE_DIR respectively prior to running the submit-all-jobs script $SRC_FILENAME is the filename of the source file this group of jobs was run from $BATCH_ID is the batch no. of the smiles $END_ID is the line no. of the last substance in the job
Revised 3D Building On Wynton
The batch size has been changed to 50K, and batches are now submitted in arrays of 1000 instead of one-by-one. Batches are now identified alphabetically instead of numerically, e.g aaa instead of 0, aab instead of 1, etc... Logs are saved to local /scratch during the runtime of the job, and then moved to $OUTPUT_DIR/log once the job has completed. Prior to this logs were being streamed to the NFS, which was causing a lot of I/O strain. Result tarballs are saved to $OUTPUT_DIR/out Input batches are saved to $OUTPUT_DIR/in