How To Load New ZINC Databases: Difference between revisions

Revision as of 20:41, 11 June 2020

Start

log in to machine
check that /local2/load directory exists and is owned by xyz
do "netstat -plunt" and check ports 5434-54XX to see which tin databases are online
if tin databases are not online, look for postgres installation and creation scripts in /nfs/home/chinzo/code/installation-utilities/*
login as xyz through sudo -i
go to ~/btingle/zinc_deploy/zinc-deploy-V2
this is where you will launch the zinc loading script from
the run_p39, run_p40, run_p41, run_p42 files are scripts to load individual partitions of the m tranches
open one of these in an editor

Environment Variables

PARALLEL_JOBS

the number of concurrent slurm jobs for this load

the higher this number the faster certain parts of the load script will be

you want to make sure you are not queuing more parallel jobs than there are cpu cores on the machine

PARTITION_NO

the # of the partition of the tranche database to load. this decides which molecules are loaded

these partitions are defined in the partitions.txt file

the # of the partitions we deploy should start at 39 and increase sequentially from there, up to a max of 134]

(so far I have deployed through 42, so start at 43)

TRANCHE_SRC

location of the source tranche directory on the nfs

the different sources are in /mnt/nfs/exa/work/jyoung/phase2_tranche/*

it is ok to load databases with the same partition number if they are from different tranche sources

CATALOG_SHORTNAME

m or s, depending on the source directory you choose

ZINC_HOST

the name of the machine you are running this script on

ZINC_PORT

the port number of the postgres database

if you're loading a database from a different source but the same partition number as another database, you must use the same port number as that database

Running the Script

I run the run_pXX scripts like so:

   screen -S p_${PARTITION_NO}_${PORT}
   time bash run_p${PARTITION_NO}

I'd prefer if the zinc-deploy-v2 folder was not crammed with any more of these one-off script files, so you should put this header in your own script files:

   RUN_DIR=~/btingle/zinc_deploy/zinc-deploy-V2
   cd $RUN_DIR
   ...

and store them in a different folder

before you load up a postgres database for the first time, make sure to run the tin_wipe script in ~/btingle/zinc_deploy/misc WARNING: this will truncate all of the major tables in the database, so make sure you're not too attached to whatever's in there

   bash
   ./tin_wipe ${HOST_NAME} ${PORT_NUMBER}

Logging

check batch_logs/<step name>/<job id>_<array id>.out for the log output of various jobs

run_catalog_load : main job output

pre_process : pre processing job output

post_process : post processing job output

How To Load New ZINC Databases: Difference between revisions

Revision as of 20:41, 11 June 2020

Contents

Start

Environment Variables

PARALLEL_JOBS

PARTITION_NO

TRANCHE_SRC

CATALOG_SHORTNAME

ZINC_HOST

ZINC_PORT

Running the Script

Logging

Navigation menu

How To Load New ZINC Databases: Difference between revisions

Revision as of 20:41, 11 June 2020

Start

Environment Variables

PARALLEL_JOBS

PARTITION_NO

TRANCHE_SRC

CATALOG_SHORTNAME

ZINC_HOST

ZINC_PORT

Running the Script

Logging

Navigation menu

Search