Continuous curation

From DISI
Revision as of 03:56, 15 November 2016 by Frodo (talk | contribs) (asdf)
Jump to navigation Jump to search

This is the continuous curation page. This serves to communicate current status among the curators, and also to the users of ZINC, what the current status of ZINC curation is.

2D (catalog) loading

  • now loading: molport and enamine-v
  • queued and ready for loading: molport-v
  • awaiting post-loading curation : DONE5 directory (count unique, update filtered, counts)
  • current sub_id max is 525,926,658 (Nov 14)

3 D (protomer) loading

  • protomer building and loading is currently on hold until we have new disk space ready (expect to resume loading Nov 20)
  • we are currently building Ellman in 3D
  • current prot_id max is 255,762,865 (Nov 14)

2D exporting

We export 2D by property (tranche browser) over 3 week period beginning on the first of the month. You can see the date we last updated each tranche using files.docking.org/2D Currently, the oldest tranche is less than 30 days old. It is our intention to maintain this level of currency.

3D exporting

Exporting 3D for the tranche browser runs continuously and takes slightly longer than a month to run. You can see the date we last updated each tranche using files.docking.org/3D Currently, the oldest tranche is less than 40 days old. It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows.


Ring curation status

We compute rings at the end of the database, and we

Pattern curation status

Ring counts, pattern counts

Biological table counts

SEA prediction curation

basic warehousing (recalculate purchasability, reactivity class)

We recalculate each catalog as it is loaded. We also recalculate the entire database continuously.


vacuuming

We continuously and aggressively vacuum tables. The rotation order is:

  • substance**, substance_to_ecfp4_new, ecfp4_new, protomer, pattern, rings, subpat, hasring -> then back to the beginning.