ZINC15 patterns
We compute which molecules have which patterns. We use these for
- the "clean" feature
- identifying covalent warheads
- preventing some molecules from being built for 3D non-covalent docking (e.g. Si, B, Sn containing)
The tables are as follows:
pattern
This table holds the smarts patterns, a fk to the origin of the pattern, an english description, a fk to the type of pattern it is, and the max sub_id that has been processed (i.e. how up to date it is)
pattern_origin
So far, we are keeping track of 4 pattern origins:
- traditional "clean" filters we've used for 10 years to build ZINC.
- PAINS (including a good number we disagree with, but we don't have the patience to argue about)
- New filters that we noticed were missing in ZINC12
- covalent warhead (functional group). / could also be a non-covalent warhead, like a hydroxamate. Watch out for protecting groups! complicated!
pattern_type
These give you some idea of what we think the pattern is for. Frankly, it is kinda arbitrary. It is a bit of documentation for the future.
- 1. weak electrophile (reasonable chance it won't react in an assay) (e.g. C=O, michael acceptor)
- 2. weak nucleophile ( e.g. SH )
- 14. moderate electrophiles (e.g. epoxide, alkyl halides)
- 15. moderate nucleophiles (could be tolerated in an assay or drug) e.g. alkyl halide.
- 16. moderately reactive (and not 14 or 15). (e.g. peroxide)
- 3. strong electrophile, incompatible with buffer conditions (e.g. boronic acid.)
- 4. strong nucleophile (e.g. ...)
- 5. Chromophore. (e.g. ...)
- 6. known aggregator
- 7. aggregator analog
- 8. we doubt, but pains says so
- 9. unclear, but we accept
- 10 unstable in buffer (e.g. S-S bond)
- 11. not cell penetrant (e.g. quarts)
- 12. too floppy for docking (entropy)
- 13. too greasy / insoluble
subpat
- links individual molecules to individual patterns
Then we go on to calculate a "cleanness ranking" for each ZINC ID using the above as follows:
0 - clean - molecule has none of these pattern, so may be boring too 1 - pains-free - molecule hits no pains (0 and 1 may be the same thing, not sure) 2 - bksclean - allows 6,7,8,9 3 - standard - additionally allows 1,2, 5, 10, 11 4 - permissive - allows 12, 13, 14, 15, 16 but not 3 and 4, the chemistry that is too hot for buffer conditions. 5 - hot. 3D models of these molecule do not get built for docking (boronic acids, Sn-containing, Si-containing, because we don't have parameters)
http://zinc15.docking.org/patterns/?description-startswith=imidazole
http://zinc15.docking.org/patterns/?description-contains=370
http://zinc15.docking.org/patterns/?description-contains=quinone
Drugs with a quinone pattern
http://zinc15.docking.org/substances/subsets/fda?pattern.description-contains=quinone
Drugs with any smarts pattern:
http://zinc15.docking.org/substances/subsets/fda?pattern.origin_fk=2
Molecules with this pattern:
http://zinc15.docking.org/substances/?pattern.name=quinone_a_370_
http://zinc15.docking.org/substances/?substance.patterns-any-type_name=weak-electrophile
pains and aggregators
curl http://zinc15.docking.org/patterns/apps/checker/ -F upload=@fda.smi -F pains=y -F aggregators=y -F output_format=txt | tee results