Continuous curation: Difference between revisions

From DISI
Jump to navigation Jump to search
(asdf)
(asdf)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
This is the continuous curation page. This serves to communicate current status among the curators, and also to the users of ZINC, what the current status of ZINC curation is.  
We continually curate ZINC.  This page describes the actions taken (briefly) and the current status, with date.
It is used to keep track of the current state of curation, to communicate among the curators, and to inform users of what is done and remains to be done.


== 2D (catalog) loading ==  
= 2D =
* now loading: molport and enamine-v
== catalog loading ==  
* queued and ready for loading:  molport-v
* Purpose:
* awaiting post-loading curation : DONE5 directory (count unique, update filtered, counts)
** To load new catalogs and catalog updates.
* current sub_id max is 525,926,658 (Nov 14)
** To deplete compounds no longer available.
** To count unique, post text files, update filtered, update original
As of Nov 14: sub_id max is 525,926,658
* loading: molport and enamine-v
* queued for loading:  molport-v
* awaiting post-loading curation : DONE5


== 3 D (protomer) loading ==  
== exporting ==  
* protomer building and loading is currently on hold until we have new disk space ready (expect to resume loading Nov 20)
* Purpose: We export 2D by property for the tranche browser over 3 week period beginning on the first of the month.
* we are currently building Ellman in 3D
As of Nov 14, oldest is Oct 25, thus ca. 20 days, which is less than our goal of  < 30 days.
* current prot_id max is 255,762,865 (Nov 14)
* CE Oct 25 oldest
* 2 running
Currently, the oldest tranche is less than 30 days old. We intend to maintain this level of currency.


== 2D exporting ==  
= 3D =  
We export 2D by property (tranche browser) over 3 week period beginning on the first of the month.
== protomer loading ==  
You can see the date we last updated each tranche using files.docking.org/2D
* Purpose: to generate 3D models and load them into ZINC.
Currently, the oldest tranche is less than 30 days old. It is our intention to maintain this level of currency.
As of Nov 14: prot_id max is 255,762,865
* preparation:  on hold
* building: on hold
* loading: on hold
Expect to resume Nov 20. Currently building Ellman.


== 3D exporting ==  
== exporting ==  
Exporting 3D for the tranche browser runs continuously and takes slightly longer than a month to run.  
* Purpose:
** Exporting 3D for the tranche browser.
** Runs continuously and takes slightly longer than a month to run.
As of Nov 14, oldest is Oct 13, which is just over 30 days.  
You can see the date we last updated each tranche using files.docking.org/3D
You can see the date we last updated each tranche using files.docking.org/3D
Currently, the oldest tranche is less than 40 days old.
Currently, the oldest tranche is less than 40 days old.
It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows.
It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows.


 
= Rings  and Patterns =
== Ring curation status ==  
== Ring curation status ==  
We compute rings at the end of the database, and we
* Purpose:
** Compute rings for newly added compounds.
** Compute rings when missing, e.g. recently returned to current status.
** Count rings when rings stabilize.
** Delete unused rings when counts refreshed.
As of Nov 14:


== Pattern curation status ==  
== Pattern curation status ==  
* Purpose:
** Compute patterns for newly added compounds.
** Compute patterns when missing, e.g. recently returned to current status.


 
= Other =  
== Ring counts, pattern counts ==
 
 
 
==  Biological table counts ==  
==  Biological table counts ==  
 
* Purpose:
** maintain counts of compounds on biological resources.


== SEA prediction curation ==  
== SEA prediction curation ==  
* Purpose:
** identify compounds with no SEA prediction
** run SEA prediction on compounds with no prediction and update the database


== basic warehousing (recalculate purchasability, reactivity class) ==  
== basic warehousing (recalculate purchasability, reactivity class) ==  
We recalculate each catalog as it is loaded. We also recalculate the entire database continuously.
We recalculate each catalog as it is loaded. We also recalculate the entire database continuously.


== vacuuming ==
== vacuuming ==

Latest revision as of 17:23, 15 November 2016

We continually curate ZINC. This page describes the actions taken (briefly) and the current status, with date. It is used to keep track of the current state of curation, to communicate among the curators, and to inform users of what is done and remains to be done.

2D

catalog loading

  • Purpose:
    • To load new catalogs and catalog updates.
    • To deplete compounds no longer available.
    • To count unique, post text files, update filtered, update original

As of Nov 14: sub_id max is 525,926,658

  • loading: molport and enamine-v
  • queued for loading: molport-v
  • awaiting post-loading curation : DONE5

exporting

  • Purpose: We export 2D by property for the tranche browser over 3 week period beginning on the first of the month.

As of Nov 14, oldest is Oct 25, thus ca. 20 days, which is less than our goal of < 30 days.

  • CE Oct 25 oldest
  • 2 running

Currently, the oldest tranche is less than 30 days old. We intend to maintain this level of currency.

3D

protomer loading

  • Purpose: to generate 3D models and load them into ZINC.

As of Nov 14: prot_id max is 255,762,865

  • preparation: on hold
  • building: on hold
  • loading: on hold

Expect to resume Nov 20. Currently building Ellman.

exporting

  • Purpose:
    • Exporting 3D for the tranche browser.
    • Runs continuously and takes slightly longer than a month to run.

As of Nov 14, oldest is Oct 13, which is just over 30 days. You can see the date we last updated each tranche using files.docking.org/3D Currently, the oldest tranche is less than 40 days old. It is our intention to keep 3D tranches within 60 days, which we feel is possible even as ZINC grows.

Rings and Patterns

Ring curation status

  • Purpose:
    • Compute rings for newly added compounds.
    • Compute rings when missing, e.g. recently returned to current status.
    • Count rings when rings stabilize.
    • Delete unused rings when counts refreshed.

As of Nov 14:

Pattern curation status

  • Purpose:
    • Compute patterns for newly added compounds.
    • Compute patterns when missing, e.g. recently returned to current status.

Other

Biological table counts

  • Purpose:
    • maintain counts of compounds on biological resources.

SEA prediction curation

  • Purpose:
    • identify compounds with no SEA prediction
    • run SEA prediction on compounds with no prediction and update the database

basic warehousing (recalculate purchasability, reactivity class)

We recalculate each catalog as it is loaded. We also recalculate the entire database continuously.

vacuuming

We continuously and aggressively vacuum tables. The rotation order is:

  • substance**, substance_to_ecfp4_new, ecfp4_new, protomer, pattern, rings, subpat, hasring -> then back to the beginning.