ZINC-22 Clean Up in Jan 2022: Difference between revisions

From DISI
Jump to navigation Jump to search
(Provisional description of major changes in the ZINC-22 database.)
 
m (Update)
Line 1: Line 1:
We started builing ZINC-22 before we really had all of the catalogging apparatus ready. As a result, some tranches were damaged, have been lost, and must be rebuilt.  
We started builing ZINC-22 before all of the cataloging apparatus was ready. As a result, some tranches were damaged, have been lost, and must be rebuilt.  


We cleaned up ZINC-22 in the first week of January 2022 as follows:
We cleaned up ZINC-22 in the first week of January 2022, deleting 47.6 million 3D molecules that you have likely docked. We are sorry but this was unavoidable.


  generation tranche.        count
  generation tranche.        count
Line 15: Line 15:
  zinc-22u | H22P320_390  | 1097048
  zinc-22u | H22P320_390  | 1097048


A total of 47.6 M 3D molecules were deleted.  If you docked ZINC_22 before Jan 4, 2022, you may have docked these molecules.
molecules in these tranches will NOT be findable by ZINC code in Cartblanche22.docking.org.  
These molecules will not look up correctly in Cartblanche22.docking.org. However, you can still look for them using
However, they CAN be found by searching the SMILES in the swp.docking.org service.  
the SMILES lookup feature.  


At the time of writing (Jan 4 noon) there are still problems with H26 to H29. These will be resolved asap.
* we still must build a new smallworld and arthor index based on ZINC-22 2d-12.  (JJ, Ben)


We are currently investigating whether 2d-0? (1-8), representing the pre-2022 database, are complete and correct.  
* there are remaining problems with H26-H29 that we are aware of and are working on.  


The current 2d database is /nfs/exb/zinc22/2d-12. We are currently investigating whether anything from 2d-0? is missing.  
* We have found a bug in which a few thousand molecules have been incorrectly treated as radicals.  
We are working to fix this.  For now, ignore.  


There are a small number of molecules in the 3D production /nfs/exd/zinc-22? that are not present in 2d-0? and 2d-12.
If a molecule does not look up correctly, you can always use its SMILES code.
We are working on figuring out how many, and what to do about them.  


If a molecule does not look up correctly, you can always use its SMILES code.
Thank you for reading.

Revision as of 01:01, 5 January 2022

We started builing ZINC-22 before all of the cataloging apparatus was ready. As a result, some tranches were damaged, have been lost, and must be rebuilt.

We cleaned up ZINC-22 in the first week of January 2022, deleting 47.6 million 3D molecules that you have likely docked. We are sorry but this was unavoidable.

generation tranche.        count
-----------------------------------
zinc-22x | H24P200_230   | 38034845
zinc-22u | H24P200_230   | 2105987
zinc-22k | H24P200_230   | 81295
zinc-22l | H24P200_230   | 39679
zinc-22m | H24P200_230   | 52743
zinc-22o | H24P200_230   | 751764
zinc-22p | H24P200_230   | 2105987
zinc-22x | H22P320_390   | 3895304
zinc-22u | H22P320_390   | 1097048

molecules in these tranches will NOT be findable by ZINC code in Cartblanche22.docking.org. However, they CAN be found by searching the SMILES in the swp.docking.org service.

  • we still must build a new smallworld and arthor index based on ZINC-22 2d-12. (JJ, Ben)
  • there are remaining problems with H26-H29 that we are aware of and are working on.
  • We have found a bug in which a few thousand molecules have been incorrectly treated as radicals.

We are working to fix this. For now, ignore.

If a molecule does not look up correctly, you can always use its SMILES code.

Thank you for reading.