ZINC15 patterns
We compute which molecules have which patterns. We use these for
- the "clean" feature
 - identifying covalent warheads
 - preventing some molecules from being built for 3D non-covalent docking (e.g. Si, B, Sn containing)
 
The tables are as follows:
pattern
This table holds the smarts patterns, a fk to the origin of the pattern, an english description, a fk to the type of pattern it is, and the max sub_id that has been processed (i.e. how up to date it is)
pattern_origin
So far, we are keeping track of 4 pattern origins:
- traditional "clean" filters we've used for 10 years to build ZINC.
 - PAINS (including a good number we disagree with, but we don't have the patience to argue about)
 - New filters that we noticed were missing in ZINC12
 - covalent warhead (functional group). / could also be a non-covalent warhead, like a hydroxamate. Watch out for protecting groups! complicated!
 
pattern_type
These give you some idea of what we think the pattern is for. Frankly, it is kinda arbitrary. It is a bit of documentation for the future.
- 1. weak electrophile (reasonable chance it won't react in an assay) (e.g. C=O, michael acceptor)
 - 2. weak nucleophile ( e.g. SH )
 - 14. moderate electrophiles (e.g. epoxide, alkyl halides)
 - 15. moderate nucleophiles (could be tolerated in an assay or drug) e.g. alkyl halide.
 - 16. moderately reactive (and not 14 or 15). (e.g. peroxide)
 - 3. strong electrophile, incompatible with buffer conditions (e.g. boronic acid.)
 - 4. strong nucleophile (e.g. ...)
 - 5. Chromophore. (e.g. ...)
 - 6. known aggregator
 - 7. aggregator analog
 - 8. we doubt, but pains says so
 - 9. unclear, but we accept
 - 10 unstable in buffer (e.g. S-S bond)
 - 11. not cell penetrant (e.g. quarts)
 - 12. too floppy for docking (entropy)
 - 13. too greasy / insoluble
 
subpat
- links individual molecules to individual patterns
 
Then we go on to calculate a "cleanness ranking" for each ZINC ID using the above as follows:
0 - clean - molecule has none of these pattern, so may be boring too 1 - pains-free - molecule hits no pains (0 and 1 may be the same thing, not sure) 2 - bksclean - allows 6,7,8,9 3 - standard - additionally allows 1,2, 5, 10, 11 4 - permissive - allows 12, 13, 14, 15, 16 but not 3 and 4, the chemistry that is too hot for buffer conditions. 5 - hot. 3D models of these molecule do not get built for docking (boronic acids, Sn-containing, Si-containing, because we don't have parameters)
http://zinc15.docking.org/patterns/?description-startswith=imidazole
http://zinc15.docking.org/patterns/?description-contains=370
http://zinc15.docking.org/patterns/?description-contains=quinone
Drugs with a quinone pattern
http://zinc15.docking.org/substances/subsets/fda?pattern.description-contains=quinone
Drugs with any smarts pattern:
http://zinc15.docking.org/substances/subsets/fda?pattern.origin_fk=2
Molecules with this pattern:
http://zinc15.docking.org/substances/?pattern.name=quinone_a_370_
http://zinc15.docking.org/substances/?substance.patterns-any-type_name=weak-electrophile
pains and aggregators
curl http://zinc15.docking.org/patterns/apps/checker/ -F upload=@fda.smi -F pains=y -F aggregators=y -F output_format=txt | tee results