ZINC15 patterns

From DISI
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

We compute which molecules have which patterns. We use these for

  • the "clean" feature
  • identifying covalent warheads
  • preventing some molecules from being built for 3D non-covalent docking (e.g. Si, B, Sn containing)

The tables are as follows:

pattern

This table holds the smarts patterns, a fk to the origin of the pattern, an english description, a fk to the type of pattern it is, and the max sub_id that has been processed (i.e. how up to date it is)

pattern_origin

So far, we are keeping track of 4 pattern origins:

  • traditional "clean" filters we've used for 10 years to build ZINC.
  • PAINS (including a good number we disagree with, but we don't have the patience to argue about)
  • New filters that we noticed were missing in ZINC12
  • covalent warhead (functional group). / could also be a non-covalent warhead, like a hydroxamate. Watch out for protecting groups! complicated!

pattern_type

These give you some idea of what we think the pattern is for. Frankly, it is kinda arbitrary. It is a bit of documentation for the future.

  • 1. weak electrophile (reasonable chance it won't react in an assay) (e.g. C=O, michael acceptor)
  • 2. weak nucleophile ( e.g. SH )
  • 14. moderate electrophiles (e.g. epoxide, alkyl halides)
  • 15. moderate nucleophiles (could be tolerated in an assay or drug) e.g. alkyl halide.
  • 16. moderately reactive (and not 14 or 15). (e.g. peroxide)
  • 3. strong electrophile, incompatible with buffer conditions (e.g. boronic acid.)
  • 4. strong nucleophile (e.g. ...)
  • 5. Chromophore. (e.g. ...)
  • 6. known aggregator
  • 7. aggregator analog
  • 8. we doubt, but pains says so
  • 9. unclear, but we accept
  • 10 unstable in buffer (e.g. S-S bond)
  • 11. not cell penetrant (e.g. quarts)
  • 12. too floppy for docking (entropy)
  • 13. too greasy / insoluble

subpat

  • links individual molecules to individual patterns


Then we go on to calculate a "cleanness ranking" for each ZINC ID using the above as follows:

0 - clean - molecule has none of these pattern, so may be boring too 1 - pains-free - molecule hits no pains (0 and 1 may be the same thing, not sure) 2 - bksclean - allows 6,7,8,9 3 - standard - additionally allows 1,2, 5, 10, 11 4 - permissive - allows 12, 13, 14, 15, 16 but not 3 and 4, the chemistry that is too hot for buffer conditions. 5 - hot. 3D models of these molecule do not get built for docking (boronic acids, Sn-containing, Si-containing, because we don't have parameters)


http://zinc15.docking.org/patterns/?description-startswith=imidazole
http://zinc15.docking.org/patterns/?description-contains=370
http://zinc15.docking.org/patterns/?description-contains=quinone

Drugs with a quinone pattern

http://zinc15.docking.org/substances/subsets/fda?pattern.description-contains=quinone

Drugs with any smarts pattern:

http://zinc15.docking.org/substances/subsets/fda?pattern.origin_fk=2

Molecules with this pattern:

http://zinc15.docking.org/substances/?pattern.name=quinone_a_370_
http://zinc15.docking.org/substances/?substance.patterns-any-type_name=weak-electrophile

pains and aggregators

curl http://zinc15.docking.org/patterns/apps/checker/ -F upload=@fda.smi      -F pains=y -F aggregators=y -F output_format=txt | tee   results