ZINC-22 rearrangement of May-2024: Difference between revisions

From DISI
Jump to navigation Jump to search
(asdf)
 
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
This is our first news report for a long time. == DRAFT ==
A few things have happened recently, which we describe below.  


* We have released a new layer /zinc-22w/, Enamine macrocycles. There are about 5000 macrocycles. We have no idea whether these are any good for docking. Please let us know.  Unusually, we have allowed up to H39, whereas for small molecules we only load and build up to H29.
== Enamine Macrocycles ==
* We have released a new layer /zinc-22w/, Enamine macrocycles. These are based on a private library of about 150K from Enamine as follows:
* 104,060 H19 to H39.
* 45,985 H40 to H49
* 654 H50-H54


* We have begun to release a new layer, /zinc-22y/. This is an incremental update. What we did was to take all the molecules in 2D registered in ZINC and ask how many of these are _not_ available in 3D ready to dock formats. We found about billions of such molecules, even just up to H24. We have begun to process them and make them available. We are currently complete up to H15.  H16 and H17 are well underway. H18 and H19 have started to appears. H20 and H21 are still in the building stage, and H22, H23 and H24 have not started to be built. We will attempt to process everything up to H24, while at the same time, we attempt to finish H25-H29 in the first generation, represented mostly by /zinc-22x/ and /zinc-22n/.  
We have built to H39. Next time (summer) we will build to H49.


* We have updated 2D molecule counts in ZINC-22.  Thus the 2D browser is now a pretty correct summary of what we have loaded. (It is working on H25 as I write, expecting to finish H26-H29 later today). ZINC-22 2D is now about 30% bigger. Old count was around 37B. Now around 50B.  
The 104,060 expand to 144,978K with steroisomers. That's dockable today in /zinc-22w/
To be clear, this number double-counts protonation states with different charges, thus if there is an imidazole and there is one protonated and one unprotonated, it counts as two. So maybe 140K really.


* We have rearranged Smallworld and Arthor databases. The information is here: [[Smallworld_and_Arthor_Databases]]
== More ZINC-22 3D structures for docking ==
There are five of each, thus: sw (public, no pw), swp (private, pw, but available), swcc (chemistry commons), swbb (building blocks) and one more that is private to UCSF. For Arthor it is the same thing:  arthor, arthorp, arthorcc, arthorbb and a UCSF only one.  
* For background information about layers, see [[ZINC22:Layers]].
* We have begun to release a new layer, /zinc-22y/.
* This is an incremental update.
* We took all the molecules in 2D registered in ZINC to H24 and ask how many of these are _not_ available in 3D ready to dock formats.
* We found about 4 billion such molecules, just up to H24.
* /zinc-22y/ is available to H19 as of May 23, 2024. We expect to get up to H24 fully updated by summer. Then we will turn to H25-29.


* We have been building and updating 3D tranches for about a year, and are now starting to push them to public servers.  
== Molecule counts in 2D and 3D tranche browser ==
This will happen over the coming weeks and we will announce when it is done.  
* We have updated 2D molecule counts in ZINC-22.
* Thus the 2D browser is now a correct summary of what we have loaded.
* ZINC-22 2D is now about 50% bigger. Old count was around 37B. Now around 55B.  
* We are updating the 3D molecule counts in ZINC-22. Work in progress.


* There have been a lot of bug fixes in Cartblanche22.docking.org. It is much more reliably now than earlier versions. If you had trouble with it, please try again.  
== Smallworld and Arthor databases ==
* We have rearranged Smallworld and Arthor databases. The information is here: [[Smallworld Databases]]. [[Arthor Databases]]
There are now five servers of each, thus: sw (public, no pw), swp (private, pw, but available), swcc (chemistry commons), swbb (building blocks) and one more that is private to UCSF. For Arthor it is the same thing:  arthor, arthorp, arthorcc, arthorbb and a UCSF only one.


* new SDI files in /zinc-22x/sets/ as of 2024-05-16
== Cartblanche22.docking.org ==
* There have been a lot of bug fixes in Cartblanche22.docking.org. It is much more reliable now than earlier versions. If you had trouble with it, please try again.


== Freshly updated SDI files ==
* new SDI files in /zinc-22x/sets/ as of 2024-05-20
* also available on Wynton.
* will be on AWS in June, 2024.
/wynton/group/bks/sets/ and
/nfs/exd/zinc-22x/sets/


They contain lists of ZINC-22 tranches organized by
charge-HAC-name.suffix where
* charge:  N=neutral, M= -1, O= +1 and so on.
* HAC is H04 to H39
* name is lead-like (HAC 17-25), frag-like (HAC 04-16), also big, greasy-leads, big-greasy
* suffix is txt (our lab), wyn (wynton) and s3 (AWS)


== /zinc-22c/ zwitterions ==
* Recently updated to H19.
== We updated /wynton/group/bks/2d/ ==
54 B smiles in ZINC-22
== Synced to Wynton ==
We sync to AWS in June 2024.
== SMILES available for 3D structures ==
We have recomputed SMILES files for each small tranche, e.g. H20/H20P200/*.smi.gz.
By layers, here is where we are (May 29, 2024)
DONE: a,b,c,i, k, l, q, r, t,w,y,z
x: H18 done. stopped.  n: H20 done. stopped.  p: H20 done stopped.  m: H17 done stopped
Almost done but still running:
d H26
g H28
h H29
o H25
s H25
u H25
v H21
'We will announce when finished. Should be mostly finished except x and n past H25. That last bit will take a while.
[[Category:News]]
[[Category:News]]
[[Category:ZINC22]]

Latest revision as of 20:49, 24 September 2024

A few things have happened recently, which we describe below.

Enamine Macrocycles

  • We have released a new layer /zinc-22w/, Enamine macrocycles. These are based on a private library of about 150K from Enamine as follows:
  • 104,060 H19 to H39.
  • 45,985 H40 to H49
  • 654 H50-H54

We have built to H39. Next time (summer) we will build to H49.

The 104,060 expand to 144,978K with steroisomers. That's dockable today in /zinc-22w/ To be clear, this number double-counts protonation states with different charges, thus if there is an imidazole and there is one protonated and one unprotonated, it counts as two. So maybe 140K really.

More ZINC-22 3D structures for docking

  • For background information about layers, see ZINC22:Layers.
  • We have begun to release a new layer, /zinc-22y/.
  • This is an incremental update.
  • We took all the molecules in 2D registered in ZINC to H24 and ask how many of these are _not_ available in 3D ready to dock formats.
  • We found about 4 billion such molecules, just up to H24.
  • /zinc-22y/ is available to H19 as of May 23, 2024. We expect to get up to H24 fully updated by summer. Then we will turn to H25-29.

Molecule counts in 2D and 3D tranche browser

  • We have updated 2D molecule counts in ZINC-22.
  • Thus the 2D browser is now a correct summary of what we have loaded.
  • ZINC-22 2D is now about 50% bigger. Old count was around 37B. Now around 55B.
  • We are updating the 3D molecule counts in ZINC-22. Work in progress.

Smallworld and Arthor databases

There are now five servers of each, thus: sw (public, no pw), swp (private, pw, but available), swcc (chemistry commons), swbb (building blocks) and one more that is private to UCSF. For Arthor it is the same thing: arthor, arthorp, arthorcc, arthorbb and a UCSF only one.

Cartblanche22.docking.org

  • There have been a lot of bug fixes in Cartblanche22.docking.org. It is much more reliable now than earlier versions. If you had trouble with it, please try again.

Freshly updated SDI files

  • new SDI files in /zinc-22x/sets/ as of 2024-05-20
  • also available on Wynton.
  • will be on AWS in June, 2024.
/wynton/group/bks/sets/ and
/nfs/exd/zinc-22x/sets/

They contain lists of ZINC-22 tranches organized by charge-HAC-name.suffix where

  • charge: N=neutral, M= -1, O= +1 and so on.
  • HAC is H04 to H39
  • name is lead-like (HAC 17-25), frag-like (HAC 04-16), also big, greasy-leads, big-greasy
  • suffix is txt (our lab), wyn (wynton) and s3 (AWS)

/zinc-22c/ zwitterions

  • Recently updated to H19.

We updated /wynton/group/bks/2d/

54 B smiles in ZINC-22

Synced to Wynton

We sync to AWS in June 2024.


SMILES available for 3D structures

We have recomputed SMILES files for each small tranche, e.g. H20/H20P200/*.smi.gz.

By layers, here is where we are (May 29, 2024)

DONE: a,b,c,i, k, l, q, r, t,w,y,z x: H18 done. stopped. n: H20 done. stopped. p: H20 done stopped. m: H17 done stopped

Almost done but still running: d H26 g H28 h H29 o H25 s H25 u H25 v H21

'We will announce when finished. Should be mostly finished except x and n past H25. That last bit will take a while.