Continuous curation

From DISI
Revision as of 17:23, 15 November 2016 by Frodo (talk | contribs) (asdf)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

We continually curate ZINC. This page describes the actions taken (briefly) and the current status, with date. It is used to keep track of the current state of curation, to communicate among the curators, and to inform users of what is done and remains to be done.

2D

catalog loading

  • Purpose:
    • To load new catalogs and catalog updates.
    • To deplete compounds no longer available.
    • To count unique, post text files, update filtered, update original

As of Nov 14: sub_id max is 525,926,658

  • loading: molport and enamine-v
  • queued for loading: molport-v
  • awaiting post-loading curation : DONE5

exporting

  • Purpose: We export 2D by property for the tranche browser over 3 week period beginning on the first of the month.

As of Nov 14, oldest is Oct 25, thus ca. 20 days, which is less than our goal of < 30 days.

  • CE Oct 25 oldest
  • 2 running

Currently, the oldest tranche is less than 30 days old. We intend to maintain this level of currency.

3D

protomer loading

  • Purpose: to generate 3D models and load them into ZINC.

As of Nov 14: prot_id max is 255,762,865

  • preparation: on hold
  • building: on hold
  • loading: on hold

Expect to resume Nov 20. Currently building Ellman.

exporting

  • Purpose:
    • Exporting 3D for the tranche browser.
    • Runs continuously and takes slightly longer than a month to run.

As of Nov 14, oldest is Oct 13, which is just over 30 days. You can see the date we last updated each tranche using files.docking.org/3D Currently, the oldest tranche is less than 40 days old. It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows.

Rings and Patterns

Ring curation status

  • Purpose:
    • Compute rings for newly added compounds.
    • Compute rings when missing, e.g. recently returned to current status.
    • Count rings when rings stabilize.
    • Delete unused rings when counts refreshed.

As of Nov 14:

Pattern curation status

  • Purpose:
    • Compute patterns for newly added compounds.
    • Compute patterns when missing, e.g. recently returned to current status.

Other

Biological table counts

  • Purpose:
    • maintain counts of compounds on biological resources.

SEA prediction curation

  • Purpose:
    • identify compounds with no SEA prediction
    • run SEA prediction on compounds with no prediction and update the database

basic warehousing (recalculate purchasability, reactivity class)

We recalculate each catalog as it is loaded. We also recalculate the entire database continuously.

vacuuming

We continuously and aggressively vacuum tables. The rotation order is:

  • substance**, substance_to_ecfp4_new, ecfp4_new, protomer, pattern, rings, subpat, hasring -> then back to the beginning.