ZINC Novelty Score

From DISI
Revision as of 00:58, 11 March 2014 by Frodo (talk | contribs)
Jump to navigation Jump to search

The ZINC Novelty Score (ZNS) is a statistic to express how unusual a molecule is compared to what is in ZINC. It is calculated automatically following a ZINC search in the new interface. The score is calculated as follows:

ZNS = 1.0 - (Tc(ecpf4) + Tc(path))/2 * 100 %

where Tc is the Tanimoto coefficient of the most similar molecule in ZINC, using either ECFP4 or Path-based fingerprints, as implemented in rdkit.

Thus molecules that are in ZINC have Tc of 1.0, and a ZNS of 0%. Molecules that are related but different to molecules in ZINC will have small ZNS scores, and molecules will approach novelty when they have no features in common with any molecules in ZINC.

There are three variants:

  • ZNS(target) : The novelty of the compound with respect to known (annotated) compounds for that target.
  • ZNS(target-pattern) : The novelty of the compound with respect to known (annotated) compounds matching a particular target pattern.
  • ZNS(*) : A special case of the above, this statistics says: how novel is the compound compared to any compound with any ChEMBL annotation (10uM or better)
  • ZNS(): Novelty compared to all molecules in ZINC, whether they are commercially available or not.
  • ZPNS(): Commercially available novelty. How novel is this compound compared to what is on the market, as reflected in ZINC. Thus if a molecule is commercially available, then its ZPNS() or ZINC Purchasable Novelty Score is 0%. A compound that is known, and even that has been for sale in the past, may still have a high ZPNS if nothing like it is currently on the market, as reflected in ZINC.