ZINC15 cron jobs

From DISI
Jump to navigation Jump to search

The following curation tasks are automated and run continuously in the background in ZINC15.


Features

We update the feature vectors set for each subset derived by catalog membership. Currently this takes too long, and Teague will re-write it to be more efficient.

2D exports

There are 121 (A-K, A-K) top level 2D tranches, and these are exported using 4 scripts that are balanced for time based on the different expected sizes of the tranches. Each of the four scripts takes about 4 days to run, so we re-export the 2D tranches approximately once a month. This process can definitely be optimized, but this is not currently a priority.

3D exports

There are 121 (A-K, A-K) top level 3D tranches, and these are exported using 4 scripts balanced for time. Each takes about a week, sometimes a little more. In practice, we end up re-exporting the 3D tranches about once every two months. This is only going to get worse when we build more protomers, and there are definitely ways to make this run a lot faster.

Catalog exports

We export 5 vendor classes (50,40,30,20,10) and expose three (stock, demand, boutique). We export three annotated classes (Annotated, Onestep and Kit). Each runs in a separate script, approximately 10 days each.

Unichem exports

We export for Unichem in 8 tranches of up to 30M molecules each. Each tranche takes one days, and we start on the 20th to finish by month end. We also export SMILES (which we discard before preparing the final Unichem image) for curation purposes.

Purchasability

We update 200 tranches of 1000 molecules, 18 hrs a day, every day. We update null purchasability at 9 am each morning.

Fingerprints

This will be added to loading, but we still need to curate fingerprint location. This is currently done manually in a screen , but we intend to add it to a cron job.