Revision as of 13:11, 8 October 2012 by Therese (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

DUD Errata

All errors reported at cataloged here. Feel free to add your own. We aim to fix all of these problems in the next release. Thank you to helpful investigators who wrote us email advising us of these problems. We have kept you anonymous. You are welcome to add your name here if you like.


  • In the DUD paper, J Med Chem 2006, Huang, Shoichet, Irwin, jm0608356. The PDB id of ADA in table1 is wrong. It should be "1ndw", not "1stw". Our final version submitted was correct, but we failed to catch this in the galleys.
  • In the RCSB the 1XQ2 structure has been superseded/replaced by the 2ao6 structure. Thanks to Paul Hawkins for bringing this to our attention.
  • Q: Why is the ratio of decoys to annotated ligands described as 36 to 1 in the paper, yet there are on average only 33 to 1 in DUD? This is due to overlap, as the same decoy could be used for multiple targets, particularly in the kinase class where there was so much overlap.
  • Two DUD decoy compounds (ZINC154632 for RXR decoys and ZINC608655 for ER decoys) were structurally identical/similar to the crystal ligands of RXR and ER, individually. This problem was caused by failing to include the crystallographic ligands in our annotated ligands set, and will be fixed in the next version of DUD. Thanks to Paul Hawkins of OpenEye for bringing this to our attention.
  • Also: PDB code for COX-1 structure in given as 1P4G but should be 1Q4G. We regret this error, and thank alert reader Paul Hawkins of OpenEye for this information Also, Hao Li of UCSF Pharm Chem points out that the PDB id of ADA in the paper is wrong. It should be 1ndw.
  • The VEGFr2 structure 1VR2 used in the paper is an apo structure. We will used 1FGI (FGFr1 kinase) in the next version of DUD, which is a ligand bound structure. This is a provisional choice.
  • The structure of Thrombin used, 1BA8, has the ligand bound covalently. Moreover, the covalent connection looks wrong. We will use 3BIU, which has a non-covalently bound ligand, in the next version of DUD.

Factor Xa

  • These each have amidine groups that have been over-protonated so that

the amidine is no longer a planar Nsp2-Csp2-Nsp2 structure: ZINC03831927, 04629420, 04633281

  • This ligand has a piperazine group that is doubly protonated. In vivo

either single protonation or (possibly) no protonation at all would be the norm. I don't believe the alternative structures are in the active set: ZINC03815578

  • These structures all have suspicious protonated dihydropyridine

structures. Such an unstable substructure is most unlikely to be present in a stable Factor Xa ligand as it would very rapidly oxidatively aromatise to the protonated pyridine. Therefore I suspect that these substructures are all incorrect: ZINC03815848, 3814850, 4631018, 4631023, 4631034


  • For thrombin_decoys.mol2 and thrombin_ligands.mol2, there seems to be something wrong with the protonation of certain amidines: the groups are doubly protonated and hydrogenated as well, resulting in C([NH3+])[NH3+], with a now tetraedric C atom. Found in the ligands set for ZINC03834109, ZINC03834111, ZINC03834112, ZINC03834113, ZINC03834114, ZINC03834115, ZINC04617937, ZINC04617938, ZINC04617939, ZINC04617940.

I've found some names (124) multiple times. Most of them describe different tautomeric and protonation states (I'm still undecided if I would prefer those to have different names...). However, for 18 such pairs I also found identical canonical smiles, so I would count them as duplicates. Ligands: ZINC03815818, ZINC04617938, ZINC04617939, ZINC04617940. Decoys: ZINC03935806, ZINC03931773, ZINC03820759, ZINC03818898, ZINC03818733, ZINC03045673, ZINC02877078, ZINC02877076, ZINC02877075, ZINC02717771, ZINC01066121, ZINC00781033, ZINC00588653, ZINC00579389.

  • Furthermore, I found 20 pairs of compounds in the decoy set, that have different names but identical canonical smiles:
ZINC03998885, ZINC04467871
ZINC03974519, ZINC04469531
ZINC03890055, ZINC04464875
ZINC03889994, ZINC04464852
ZINC03889707, ZINC04464762
ZINC03867866, ZINC04465782
ZINC03867594, ZINC04465649
ZINC03867505, ZINC04465603
ZINC03859006, ZINC04468426
ZINC03857957, ZINC04469620
ZINC03857648, ZINC04464507
ZINC03857636, ZINC04464501
ZINC03857579, ZINC04464474
ZINC03857293, ZINC04464286
ZINC03857182, ZINC04464189
ZINC03857181, ZINC04464188
ZINC03817650, ZINC04628541
ZINC02686537, ZINC03892786
ZINC01548784, ZINC03820719
ZINC01040460, ZINC01040461

I checked a few of these duplicates manually and found identical 3D structures, though sometimes different atom numbering - compare for example ZINC01040460 and ZINC01040461.

  • Doubtful compounds where the amidino group is not planar, ZINC names of thrombin dud_ligands2006: 03834109, 03834111, 03834112, 03834113, 03834114, 03834115, 04617937, 04617938, 04617939


  • In structures ZINC03815826, ZINC03834185, and ZINC04617941(for example), there are C(-[N+])-[N+] substructures. My medicinal chemistry colleagues assure me that (a) these are not stable, and (b) if they were stable they wouldn't be double-protonated. Is it possible that these should actually be amidines and have been drawn incorrectly? The supplier for these is listed as 'DUD', so there's no definitive chemical supplier to check with. There are a number of these in the dataset, so would significantly affect the results...

See also: ZINC:Errata