ZINC Biogenic Libraries
Biogenic and Biogenic-like libraries in ZINC.
We have created screening libraries based on molecules of biological origin. To be clear, we include both primary metabolites - often just called metabolites - as well as secondary metabolites - often called natural products - in our database of biogenic molecules. These libraries are inspired by the argument in Hert et al NCB 2008, we then find all compounds that are similar to these biogenic molecules for the biogenic-like libraries. We are also inspired by the Dortmund and Broad/Harvard groups working in the areas of natural products and nature-inspired compounds.
Assembly
- 1. All biogenic compounds from public sources. The purchasable version of this is subset 98. Zbc - ZINC Biogenic compounds.
- 2. Tanimoto 80% similarity or better to any Biogenic compound, based on rdkit path-based fingerprints, 2048 bits.
- 3. We fragment Biogenic compounds into Murcko Scaffolds and ring systems (Ertl, via molinspiration. type 2 and 3 fragmentation). We retain only ring systems of 10 or more atoms and then accept compounds having Tanimoto 80% similarity or better (rdkit 2048 pathbased) to any Biogenic 10+ atom fragment thus calculated.
Results
Subsets are organized into lead-like, fragment-like, drug-like, all, and shard-like subsets as usual, for both biogenic and biogenic like. These are called Zbc - ZINC Biogenic compounds and Zni - ZINC Nature Inspired. We made these names deliberately different for clarity. Zbc compounds are produced by nature, and nature has been seeing them for evolutionary time. Zni - nature inspired - include both compounds from nature and synthetic compounds that look natural, when you have your Tanimoto 80% glasses on.
Inspiration
Hert, Dortmund Group, Broad/Harvard Group.
Argument
The argument is that nature-like biased screening libraries should provide far richer and denser hits that one would expect by screening synthetic compounds alone. Indeed, the only reason HTS and virtual screening work as well as they do is that they are already heavily biased towards biogenic like molecules. These may also be good for protein function identification via docking, because if a site recognizes a molecule from nature, then perhaps that or a similar one is the endogenous ligand for that site.