Continuous curation: Difference between revisions
(asdf) |
(asdf) |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
This is the | We continually curate ZINC. This page describes the actions taken (briefly) and the current status, with date. | ||
It is used to keep track of the current state of curation, to communicate among the curators, and to inform users of what is done and remains to be done. | |||
== | = 2D = | ||
* | == catalog loading == | ||
* queued | * Purpose: | ||
* awaiting post-loading curation : DONE5 | ** To load new catalogs and catalog updates. | ||
** To deplete compounds no longer available. | |||
** To count unique, post text files, update filtered, update original | |||
As of Nov 14: sub_id max is 525,926,658 | |||
* loading: molport and enamine-v | |||
* queued for loading: molport-v | |||
* awaiting post-loading curation : DONE5 | |||
== | == exporting == | ||
* | * Purpose: We export 2D by property for the tranche browser over 3 week period beginning on the first of the month. | ||
* | As of Nov 14, oldest is Oct 25, thus ca. 20 days, which is less than our goal of < 30 days. | ||
* | * CE Oct 25 oldest | ||
* 2 running | |||
Currently, the oldest tranche is less than 30 days old. We intend to maintain this level of currency. | |||
== | = 3D = | ||
== protomer loading == | |||
* Purpose: to generate 3D models and load them into ZINC. | |||
Currently | As of Nov 14: prot_id max is 255,762,865 | ||
* preparation: on hold | |||
* building: on hold | |||
* loading: on hold | |||
Expect to resume Nov 20. Currently building Ellman. | |||
== | == exporting == | ||
Exporting 3D for the tranche browser | * Purpose: | ||
** Exporting 3D for the tranche browser. | |||
** Runs continuously and takes slightly longer than a month to run. | |||
As of Nov 14, oldest is Oct 13, which is just over 30 days. | |||
You can see the date we last updated each tranche using files.docking.org/3D | You can see the date we last updated each tranche using files.docking.org/3D | ||
Currently, the oldest tranche is less than 40 days old. | Currently, the oldest tranche is less than 40 days old. | ||
It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows. | It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows. | ||
= Rings and Patterns = | |||
== Ring curation status == | == Ring curation status == | ||
* Purpose: | |||
** Compute rings for newly added compounds. | |||
** Compute rings when missing, e.g. recently returned to current status. | |||
** Count rings when rings stabilize. | |||
** Delete unused rings when counts refreshed. | |||
As of Nov 14: | |||
== Pattern curation status == | == Pattern curation status == | ||
* Purpose: | |||
** Compute patterns for newly added compounds. | |||
** Compute patterns when missing, e.g. recently returned to current status. | |||
= Other = | |||
== | |||
== Biological table counts == | == Biological table counts == | ||
* Purpose: | |||
** maintain counts of compounds on biological resources. | |||
== SEA prediction curation == | == SEA prediction curation == | ||
* Purpose: | |||
** identify compounds with no SEA prediction | |||
** run SEA prediction on compounds with no prediction and update the database | |||
== basic warehousing (recalculate purchasability, reactivity class) == | == basic warehousing (recalculate purchasability, reactivity class) == | ||
We recalculate each catalog as it is loaded. We also recalculate the entire database continuously. | We recalculate each catalog as it is loaded. We also recalculate the entire database continuously. | ||
== vacuuming == | == vacuuming == |
Latest revision as of 17:23, 15 November 2016
We continually curate ZINC. This page describes the actions taken (briefly) and the current status, with date. It is used to keep track of the current state of curation, to communicate among the curators, and to inform users of what is done and remains to be done.
2D
catalog loading
- Purpose:
- To load new catalogs and catalog updates.
- To deplete compounds no longer available.
- To count unique, post text files, update filtered, update original
As of Nov 14: sub_id max is 525,926,658
- loading: molport and enamine-v
- queued for loading: molport-v
- awaiting post-loading curation : DONE5
exporting
- Purpose: We export 2D by property for the tranche browser over 3 week period beginning on the first of the month.
As of Nov 14, oldest is Oct 25, thus ca. 20 days, which is less than our goal of < 30 days.
- CE Oct 25 oldest
- 2 running
Currently, the oldest tranche is less than 30 days old. We intend to maintain this level of currency.
3D
protomer loading
- Purpose: to generate 3D models and load them into ZINC.
As of Nov 14: prot_id max is 255,762,865
- preparation: on hold
- building: on hold
- loading: on hold
Expect to resume Nov 20. Currently building Ellman.
exporting
- Purpose:
- Exporting 3D for the tranche browser.
- Runs continuously and takes slightly longer than a month to run.
As of Nov 14, oldest is Oct 13, which is just over 30 days. You can see the date we last updated each tranche using files.docking.org/3D Currently, the oldest tranche is less than 40 days old. It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows.
Rings and Patterns
Ring curation status
- Purpose:
- Compute rings for newly added compounds.
- Compute rings when missing, e.g. recently returned to current status.
- Count rings when rings stabilize.
- Delete unused rings when counts refreshed.
As of Nov 14:
Pattern curation status
- Purpose:
- Compute patterns for newly added compounds.
- Compute patterns when missing, e.g. recently returned to current status.
Other
Biological table counts
- Purpose:
- maintain counts of compounds on biological resources.
SEA prediction curation
- Purpose:
- identify compounds with no SEA prediction
- run SEA prediction on compounds with no prediction and update the database
basic warehousing (recalculate purchasability, reactivity class)
We recalculate each catalog as it is loaded. We also recalculate the entire database continuously.
vacuuming
We continuously and aggressively vacuum tables. The rotation order is:
- substance**, substance_to_ecfp4_new, ecfp4_new, protomer, pattern, rings, subpat, hasring -> then back to the beginning.