ZINC15 patterns

Revision as of 19:36, 21 June 2015 by Frodo (talk | contribs) (asdf)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

We compute which molecules have which patterns. We use these for

  • the "clean" feature
  • identifying covalent warheads
  • preventing some molecules from being built for 3D non-covalent docking (e.g. Si, B, Sn containing)

The tables are as follows:


This table holds the smarts patterns, a fk to the origin of the pattern, an english description, a fk to the type of pattern it is, and the max sub_id that has been processed (i.e. how up to date it is)


So far, we are keeping track of 4 pattern origins:

  • traditional "clean" filters we've used for 10 years to build ZINC.
  • PAINS (including a good number we disagree with, but we don't have the patience to argue about)
  • New filters that we noticed were missing in ZINC12
  • covalent warhead (functional group). / could also be a non-covalent warhead, like a hydroxamate. Watch out for protecting groups! complicated!


These give you some idea of what we think the pattern is for. Frankly, it is kinda arbitrary. It is a bit of documentation for the future.

  • 1. weak electrophile (reasonable chance it won't react in an assay) (e.g. C=O, michael acceptor)
  • 2. weak nucleophile ( e.g. SH )
  • 14. moderate electrophiles (e.g. epoxide, alkyl halides)
  • 15. moderate nucleophiles (could be tolerated in an assay or drug) e.g. alkyl halide.
  • 16. moderately reactive (and not 14 or 15). (e.g. peroxide)
  • 3. strong electrophile, incompatible with buffer conditions (e.g. boronic acid.)
  • 4. strong nucleophile (e.g. ...)
  • 5. Chromophore. (e.g. ...)
  • 6. known aggregator
  • 7. aggregator analog
  • 8. we doubt, but pains says so
  • 9. unclear, but we accept
  • 10 unstable in buffer (e.g. S-S bond)
  • 11. not cell penetrant (e.g. quarts)
  • 12. too floppy for docking (entropy)
  • 13. too greasy / insoluble

Then we go on to calculate a "cleanness ranking" using the above as follows:

0 - clean - molecule has none of these pattern, so may be boring too 1 - pains-free - molecule hits no pains (0 and 1 may be the same thing, not sure) 2 - bksclean - allows 6,7,8,9 3 - standard - additionally allows 1,2, 5, 10, 11 4 - permissive - allows 12, 13, 14, 15, 16 but not 3 and 4, the chemistry that is too hot for buffer conditions. 5 - hot, but we don't allow this.