ZINC15 patterns

From DISI
Jump to navigation Jump to search

We compute which molecules have which patterns. We use these for

  • the "clean" feature
  • identifying covalent warheads
  • preventing some molecules from being built for 3D non-covalent docking (e.g. Si, B, Sn containing)

The tables are as follows:

pattern

This table holds the smarts patterns, a fk to the origin of the pattern, an english description, a fk to the type of pattern it is, and the max sub_id that has been processed (i.e. how up to date it is)

pattern_origin

So far, we are keeping track of 4 pattern origins:

  • traditional "clean" filters we've used for 10 years to build ZINC.
  • PAINS (including a good number we disagree with, but we don't have the patience to argue about)
  • New filters that we noticed were missing in ZINC12
  • covalent warhead (functional group). / could also be a non-covalent warhead, like a hydroxamate. Watch out for protecting groups! complicated!

pattern_type

These give you some idea of what we think the pattern is for. Frankly, it is kinda arbitrary. It is a bit of documentation for the future.

  • 1. weak electrophile (reasonable chance it won't react in an assay) (e.g. C=O, michael acceptor)
  • 2. weak nucleophile ( e.g. SH )
  • 14. moderate electrophiles (e.g. epoxide, alkyl halides)
  • 15. moderate nucleophiles (could be tolerated in an assay or drug) e.g. alkyl halide.
  • 16. moderately reactive (and not 14 or 15). (e.g. peroxide)
  • 3. strong electrophile, incompatible with buffer conditions (e.g. boronic acid.)
  • 4. strong nucleophile (e.g. ...)
  • 5. Chromophore. (e.g. ...)
  • 6. known aggregator
  • 7. aggregator analog
  • 8. we doubt, but pains says so
  • 9. unclear, but we accept
  • 10 unstable in buffer (e.g. S-S bond)
  • 11. not cell penetrant (e.g. quarts)
  • 12. too floppy for docking (entropy)
  • 13. too greasy / insoluble

subpat

  • links individual molecules to individual patterns


Then we go on to calculate a "cleanness ranking" for each ZINC ID using the above as follows:

0 - clean - molecule has none of these pattern, so may be boring too 1 - pains-free - molecule hits no pains (0 and 1 may be the same thing, not sure) 2 - bksclean - allows 6,7,8,9 3 - standard - additionally allows 1,2, 5, 10, 11 4 - permissive - allows 12, 13, 14, 15, 16 but not 3 and 4, the chemistry that is too hot for buffer conditions. 5 - hot. 3D models of these molecule do not get built for docking (boronic acids, Sn-containing, Si-containing, because we don't have parameters)


http://zinc15.docking.org/patterns/?description-startswith=imidazole
http://zinc15.docking.org/patterns/?description-contains=370
http://zinc15.docking.org/patterns/?description-contains=quinone

Drugs with a quinone pattern

http://zinc15.docking.org/substances/subsets/fda?pattern.description-contains=quinone

Drugs with any smarts pattern:

http://zinc15.docking.org/substances/subsets/fda?pattern.origin_fk=2

Molecules with this pattern:

http://zinc15.docking.org/substances/?pattern.name=quinone_a_370_
http://zinc15.docking.org/substances/?substance.patterns-any-type_name=weak-electrophile

pains and aggregators

curl http://zinc15.docking.org/patterns/apps/checker/ -F upload=@fda.smi      -F pains=y -F aggregators=y -F output_format=txt | tee   results