Difference between revisions of "ZINC-22 Clean Up in Jan 2022"

From DISI
Jump to navigation Jump to search
(Provisional description of major changes in the ZINC-22 database.)
 
m (asdf)
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
We started builing ZINC-22 before we really had all of the catalogging apparatus ready. As a result, some tranches were damaged, have been lost, and must be rebuilt.
+
==Background ==
  
We cleaned up ZINC-22 in the first week of January 2022 as follows:  
+
We started building ZINC-22 before all of the cataloging apparatus was ready. As a result of mistakes we made, a few tranches were damaged, have been lost, and must be rebuilt.
 +
 
 +
== Actions taken ==
 +
We cleaned up ZINC-22 in the first week of January 2022, deleting 47.6 million 3D molecules that you have likely docked. We are sorry but this was unavoidable.
 +
 
 +
== What was affected ==
 +
The affected molecules are:  
 +
* H24P200-230 which have the prefix oq, or, os, ot after ZINC.  Only per the table below.
 +
* H22P320-380 which have the prefix mC, mD, mE, mF, mG, mH, mI, mJ after ZINC. per the table below.
  
 
  generation tranche.        count
 
  generation tranche.        count
Line 15: Line 23:
 
  zinc-22u | H22P320_390  | 1097048
 
  zinc-22u | H22P320_390  | 1097048
  
A total of 47.6 M 3D molecules were deleted.  If you docked ZINC_22 before Jan 4, 2022, you may have docked these molecules.
+
== Impact ==
These molecules will not look up correctly in Cartblanche22.docking.org. However, you can still look for them using
+
Molecules in these tranches will NOT be findable by ZINC code in Cartblanche22.docking.org.  
the SMILES lookup feature.  
+
However, they CAN be found by searching the SMILES in the swp.docking.org service.
 +
So you can still likely buy them.
 +
 
 +
== Other news ==
 +
* We are building/updating new smallworld and arthor indexes based on ZINC-22 2d-12.  H04-H26
 +
* We are working to complete H27-H29. Maybe Feb 1. Until then, looking up molecules in cartblanche22 H27-H29 may fail.
 +
workaround is to look up in smallworld private.  
  
At the time of writing (Jan 4 noon) there are still problems with H26 to H29. These will be resolved asap.  
+
* We have found a bug in which a few thousand molecules - mostly N-oxides or other uncommon functional groups - have been incorrectly treated as radicals.  We are working to fix this. For now, ignore them if you see them.
  
We are currently investigating whether 2d-0? (1-8), representing the pre-2022 database, are complete and correct.  
+
* ZINC20 in stock in ZINC-22, called /zinc-22g/ has been completely re-created. The numbering of molecules may have changed.
 +
We regret the impact this may cause. We had generated several versions of zinc-22g "ZINC 20 in stock" in 2021. They are all gone.  
  
The current 2d database is /nfs/exb/zinc22/2d-12. We are currently investigating whether anything from 2d-0? is missing.  
+
We will try to minimize future disruption. The changes described here were essential to getting a high level of correctness in the database without delay.
  
There are a small number of molecules in the 3D production /nfs/exd/zinc-22? that are not present in 2d-0? and 2d-12.
+
Questions to jjiteam at googlegroups dot com.
We are working on figuring out how many, and what to do about them.  
 
  
If a molecule does not look up correctly, you can always use its SMILES code.
+
[[Category:ZINC-22-news]]

Latest revision as of 17:18, 1 February 2022

Background

We started building ZINC-22 before all of the cataloging apparatus was ready. As a result of mistakes we made, a few tranches were damaged, have been lost, and must be rebuilt.

Actions taken

We cleaned up ZINC-22 in the first week of January 2022, deleting 47.6 million 3D molecules that you have likely docked. We are sorry but this was unavoidable.

What was affected

The affected molecules are:

  • H24P200-230 which have the prefix oq, or, os, ot after ZINC. Only per the table below.
  • H22P320-380 which have the prefix mC, mD, mE, mF, mG, mH, mI, mJ after ZINC. per the table below.
generation tranche.        count
-----------------------------------
zinc-22x | H24P200_230   | 38034845
zinc-22u | H24P200_230   | 2105987
zinc-22k | H24P200_230   | 81295
zinc-22l | H24P200_230   | 39679
zinc-22m | H24P200_230   | 52743
zinc-22o | H24P200_230   | 751764
zinc-22p | H24P200_230   | 2105987
zinc-22x | H22P320_390   | 3895304
zinc-22u | H22P320_390   | 1097048

Impact

Molecules in these tranches will NOT be findable by ZINC code in Cartblanche22.docking.org. However, they CAN be found by searching the SMILES in the swp.docking.org service. So you can still likely buy them.

Other news

  • We are building/updating new smallworld and arthor indexes based on ZINC-22 2d-12. H04-H26
  • We are working to complete H27-H29. Maybe Feb 1. Until then, looking up molecules in cartblanche22 H27-H29 may fail.

workaround is to look up in smallworld private.

  • We have found a bug in which a few thousand molecules - mostly N-oxides or other uncommon functional groups - have been incorrectly treated as radicals. We are working to fix this. For now, ignore them if you see them.
  • ZINC20 in stock in ZINC-22, called /zinc-22g/ has been completely re-created. The numbering of molecules may have changed.

We regret the impact this may cause. We had generated several versions of zinc-22g "ZINC 20 in stock" in 2021. They are all gone.

We will try to minimize future disruption. The changes described here were essential to getting a high level of correctness in the database without delay.

Questions to jjiteam at googlegroups dot com.