Revision as of 00:41, 7 December 2007 by JohnIrwin (talk | contribs)
Jump to navigation Jump to search

Here are errata as reported for ZINC:

  • for SIGMA propiophenone P51605 ZINC has 1671385 entry, and the ring in it does not show as aromatic.
  • many molecules reported with ZINC01278699. Sorry about this case. It will be removed in the next version.
  • the following pairs are not identical, but actually different protonation states of hydroxamic acids (looks like PipelinePilot has a problem interpreting the mol2 files, I rechecked everything with the sdf files): ZINC03817650, ZINC04628541; ZINC01548784, ZINC03820719
  • I downloaded the databases Asinex and Sigma-aldrich from the version 7

of ZINC in both the formats SMILES and MOL2. For both the databases I found a difference in the molecules present in the archives, that means some molecules present in the multi-mol2 file and not in the SMILES and vice versa. Is it possible or I did some errors in the comparison?

No, you are quite correct. I just did:

>  zmore sial_p0.smi.gz | awk '{print $2}' | sort -u > smiles_codes  
>  zcat sial_p0.?.mol2.gz | grep ZINC | sort -u > mol2_p0_codes
>  wc -l smiles_codes mol2_p0_codes
114763 smiles_codes
112069 mol2_p0_codes
> diff smiles_codes  mol2_p0_codes  |wc -l

I agree that there are a little over 2,500 differences in the mol2 and SMILES of Sigma Aldrich in ZINC version 7, a little over 2% of the library.