How To Load New ZINC Databases: Difference between revisions

From DISI
Jump to navigation Jump to search
(3 intermediate revisions by the same user not shown)
Line 107: Line 107:
...
...


to be clear, you would only want to export any of the variables for a particular stage if that stage succeeded the previous time you ran the script. It saves time to not re-do any work.
to be clear, you would only want to export any of the variables for a particular stage if that stage succeeded the previous time you ran the script. It saves time to not have to re-do any work.
</nowiki>
 
= Email to Abhinav and Khanh, 6/22 =
 
<nowiki>
Abhinav and Khanh,
 
It turns out that the loading on n-1-17 was being polluted by other scripts john was running on the same machine.
Also, there was bug in the script which caused it to fail when loading multiple files onto the same database. This only affected the loads on n-1-17.
I've started a test run of partition 134 s on that machine, to test a fix I wrote for the script.
If you started a load that had multiple files in it before you run you will need to:
 
1. wipe the database for that port
2. purge the loading source directory (I'll explain)
3. re-run all scripts for that database
 
The loading script produces a number of intermediate files when preparing them for loading into the database. These root directory for these files is the /local2/load directory.
You will see a number of directories named after a tranche range. Here are the important locations:
 
/local2/load
    H??P???_H??P???/ - the root folder for an individual partition database. partitions can contain one ore more tranches
        src/ - the important output of scripts is stored here. This folder should usually not be messed with
            H??P???/ - the final table data for individual tranches in this partition. Also contain archives that record what has been added to the database
        stage/ - this is where we do our "calculations" for each step. you are safe to delete files in this directory once you're sure you don't need them anymore
            preprocessing/
            postprocessing/ - (check the wiki page on when and how to skip certain steps of the script)
            resolve/
            loading/
        tmp/ - source files are copied here temporarily
        config.txt - contains the port number for this partition database
 
The script was assigning incorrect id numbers to new entries in the database when the partition had more than one tranche (input file), causing numerous indexes to fail on rebuilding.
In this case, we need to wipe the database and completely delete the contents of the source directory for each database because the final output was corrupted.
Further runs of the script depend on this final output to be correct.
You are safe to skip all steps except resolution and loading when re-trying a script that had this problem.
 
Hope this has cleared some things up.
 
Ben
</nowiki>
</nowiki>

Revision as of 22:59, 22 June 2020

Start

  • log in to machine
  • check that /local2/load directory exists and is owned by xyz
  • clear it out if it does exist
  • do "netstat -plunt" and check ports 5434-54XX to see which tin databases are online
  • if tin databases are not online, look for postgres installation and creation scripts in /nfs/home/chinzo/code/installation-utilities/*
  • login as xyz through sudo -i
  • go to ~/btingle/zinc_deploy/zinc-deploy-V2
  • this is where you will launch the zinc loading script from
  • the run_p39, run_p40, run_p41, run_p42 files are scripts to load individual partitions of the m tranches
  • open one of these in an editor

Environment Variables

PARALLEL_JOBS

the number of concurrent slurm jobs for this load

the higher this number the faster certain parts of the load script will be

you want to make sure you are not queuing more parallel jobs than there are cpu cores on the machine

PARTITION_NO

the # of the partition of the tranche database to load. this decides which molecules are loaded

these partitions are defined in the partitions.txt file

the # of the partitions we deploy should start at 39 and increase sequentially from there, up to a max of 134]

(so far I have deployed through 42, so start at 43)

TRANCHE_SRC

location of the source tranche directory on the nfs

the different sources are in /mnt/nfs/exa/work/jyoung/phase2_tranche/*

it is ok to load databases with the same partition number if they are from different tranche sources

CATALOG_SHORTNAME

m or s, depending on the source directory you choose

ZINC_HOST

the name of the machine you are running this script on

ZINC_PORT

the port number of the postgres database

if you're loading a database from a different source but the same partition number as another database, you must use the same host and port number as that database

Running the Script

I run the run_pXX scripts like so:

   screen -S p_${PARTITION_NO}_${PORT}
   time bash run_p${PARTITION_NO}

I'd prefer if the zinc-deploy-v2 folder was not crammed with any more of these one-off script files, so you should put this header in your own script files:

   RUN_DIR=~/btingle/zinc_deploy/zinc-deploy-V2
   cd $RUN_DIR
   ...

and store them in a different folder

before you load up a postgres database for the first time, make sure to run the tin_wipe script in ~/btingle/zinc_deploy/misc WARNING: this will truncate all of the major tables in the database, so make sure you're not too attached to whatever's in there

   bash
   ./tin_wipe ${HOST_NAME} ${PORT_NUMBER}

Logging

check batch_logs/<step name>_<job id>_<array id>.out for the log output of various jobs

<step name> : <description>

run_catalog_load : main job output, loading and resolution script output

pre_process : pre processing job output

post_process : post processing job output

Email to Khanh, 6/15

Khanh
The reason there was an error is because of misconfigured environment variables, but I've fixed them now. Your script seems to have succeeded in pre-processing the molecules, but failed on the following 
stages. No big deal, you can just re-run the script with some changes.

If the pre-processing stage went through but there was an error in any of the following stages I added the option to skip that(those) stage(s) in the following run. You can also skip the resolution and
post processing stages if something goes wrong there. To see if there was an error with any of the stages you can check .err, but you can also check .out to see what the main script output is. If you
don't see any jobs pop up for a particular stage, or if the output indicates it processed zero entries for that stage you can be sure something went wrong. You can re-run the script with these
environment variables to skip that stage. Only export this environment variable if there was an error during a previous time you ran the script, for example you would not want to export any of these 
when running the script for a particular partition the first time.

export SKIP_PRE_PROCESS="TRUE"
export SKIP_RESOLUTION="TRUE"
export SKIP_POST_PROCESS="TRUE"
export SKIP_LOADING="TRUE"

(I have no idea why you would ever want to use that last one, but it's there for completeness's sake)
Lmk if you have questions.
Ben

...

to be clear, you would only want to export any of the variables for a particular stage if that stage succeeded the previous time you ran the script. It saves time to not have to re-do any work.

Email to Abhinav and Khanh, 6/22

Abhinav and Khanh,

It turns out that the loading on n-1-17 was being polluted by other scripts john was running on the same machine.
Also, there was bug in the script which caused it to fail when loading multiple files onto the same database. This only affected the loads on n-1-17. 
I've started a test run of partition 134 s on that machine, to test a fix I wrote for the script. 
If you started a load that had multiple files in it before you run you will need to:

1. wipe the database for that port
2. purge the loading source directory (I'll explain)
3. re-run all scripts for that database

The loading script produces a number of intermediate files when preparing them for loading into the database. These root directory for these files is the /local2/load directory. 
You will see a number of directories named after a tranche range. Here are the important locations:

/local2/load
    H??P???_H??P???/ - the root folder for an individual partition database. partitions can contain one ore more tranches
        src/ - the important output of scripts is stored here. This folder should usually not be messed with
            H??P???/ - the final table data for individual tranches in this partition. Also contain archives that record what has been added to the database
        stage/ - this is where we do our "calculations" for each step. you are safe to delete files in this directory once you're sure you don't need them anymore
            preprocessing/
            postprocessing/ - (check the wiki page on when and how to skip certain steps of the script)
            resolve/
            loading/
        tmp/ - source files are copied here temporarily
        config.txt - contains the port number for this partition database

The script was assigning incorrect id numbers to new entries in the database when the partition had more than one tranche (input file), causing numerous indexes to fail on rebuilding. 
In this case, we need to wipe the database and completely delete the contents of the source directory for each database because the final output was corrupted. 
Further runs of the script depend on this final output to be correct. 
You are safe to skip all steps except resolution and loading when re-trying a script that had this problem.

Hope this has cleared some things up.

Ben