Here are the curator's notes for ZINC 15.
as xyz: example 1. by catalog
python ~/code/ZINC/contrib/fetch.py -t substance -f 'smiles,zinc_id' 'catalog.short_name = sial' | split -l 1000
example 2. by physical property tranche
python ~/code/ZINC/contrib/fetch.py -t substance -f 'smiles,zinc_id' 'structure.mwt < 200' 'protomer.prot_id is null' | split -a 4 -l 1000
- ls -l | wc -l (340)
qsub -t 1-340 ~ts/Code/DOCK/ligand/generate/example-protomer-wrapper.sh ~/work/protomer
delta incremental load of smiles from zinc8
as jji. based on exported smiles,sub_id from zinc8 since last update.
python ~/code/ZINC/zinc/load/create_substances.py xaa.ism --progress
database export by property tranche
as xyz per ~/zinc14/byproperty/
psql> cluster each table by primary key (or whatever makes sense)
psql > vacuum verbose analyze
for each fast moving table
create temporary table blah (id integer);
\copy blah from 'ji4.txt'
\copy (select * from retired where replacement_fk in (select id from blah) ) to 'ji4.retired.save'
delete from substance where sub_id in (select id from blah) ;
select * from catalog_item where sub_id_fk in (select id from blah);
create table target_class2 (like target_class);
on postgres vacuuming
I am noticing that, when I am doing major revisions of the protomer, substance and chembl21a.sea tables, where "major revisions" means adding, modifying or deleting multiple millions of rows per week and sometimes even per day, that when these massive changes are being done, I need to vacuum analyze the table at least once a week to prevent autovacuum from kicking in. Autovacuum is slower than a manually executed vacuum verbose analyze, and does not do as good a job. For those tables that are changing less than one million rows a week, vacuuming once a month is probably good enough.