Zinc15 curators notes

From DISI
Jump to: navigation, search

Here are the curator's notes for ZINC 15.

protomer (re)build

as xyz: example 1. by catalog

python ~/code/ZINC/contrib/fetch.py -t substance -f 'smiles,zinc_id' 'catalog.short_name = sial' | split -l 1000

example 2. by physical property tranche

python ~/code/ZINC/contrib/fetch.py -t substance -f 'smiles,zinc_id' 'structure.mwt < 200' 'protomer.prot_id is null' | split -a 4  -l 1000


  1. ls -l | wc -l (340)
qsub -t 1-340 ~ts/Code/DOCK/ligand/generate/example-protomer-wrapper.sh ~/work/protomer

delta incremental load of smiles from zinc8

as jji. based on exported smiles,sub_id from zinc8 since last update.

python ~/code/ZINC/zinc/load/create_substances.py xaa.ism --progress

database export by property tranche

as xyz per ~/zinc14/byproperty/

step1.csh 
step2.csh
step3.csh 

don't forget

psql> cluster each table by primary key (or whatever makes sense)
psql > vacuum verbose analyze for each fast moving table backups

cleaning up

create temporary table blah  (id integer);
\copy blah from 'ji4.txt'
begin;

\copy (select * from retired where replacement_fk in (select id from blah) ) to 'ji4.retired.save'
delete from substance where sub_id in (select id from blah) ; 
select * from catalog_item where sub_id_fk in (select id from blah);
commit;


create table target_class2 (like target_class);


on postgres vacuuming

I am noticing that, when I am doing major revisions of the protomer, substance and chembl21a.sea tables, where "major revisions" means adding, modifying or deleting multiple millions of rows per week and sometimes even per day, that when these massive changes are being done, I need to vacuum analyze the table at least once a week to prevent autovacuum from kicking in. Autovacuum is slower than a manually executed vacuum verbose analyze, and does not do as good a job. For those tables that are changing less than one million rows a week, vacuuming once a month is probably good enough.