Difference between revisions of "ZINC:Errata"

From DISI
Jump to navigation Jump to search
Line 5: Line 5:
  
 
* many molecules reported with    ZINC01278699. Sorry about this case. It will be removed in the next version.
 
* many molecules reported with    ZINC01278699. Sorry about this case. It will be removed in the next version.
 +
 +
 +
* I downloaded the databases Asinex and Sigma-aldrich from the version 7
 +
of ZINC in both the formats SMILES and MOL2.  For both the databases I
 +
found a difference in the molecules present in the archives, that means
 +
some molecules present in the multi-mol2 file and not in the SMILES and
 +
vice versa. Is it possible or I did some errors in the comparison?
 +
 +
No, you are quite correct. I just did:
 +
>  zmore sial_p0.smi.gz | awk '{print $2}' | sort -u > smiles_codes 
 +
>  zcat sial_p0.?.mol2.gz | grep ZINC | sort -u > mol2_p0_codes
 +
>  wc -l smiles_codes mol2_p0_codes
 +
114763 smiles_codes
 +
112069 mol2_p0_codes
 +
> diff smiles_codes  mol2_p0_codes  |wc -l
 +
4265
 +
 +
I agree that there are a little over 2,500 differences in the mol2 and SMILES of Sigma Aldrich in ZINC version 7, a little over 2% of the library.
  
 
[[Category:Errata]]
 
[[Category:Errata]]
 
[[Category:ZINC]]
 
[[Category:ZINC]]

Revision as of 22:35, 6 December 2007

Here are errata as reported for ZINC:


  • for SIGMA propiophenone P51605 ZINC has 1671385 entry, and the ring in it does not show as aromatic.
  • many molecules reported with ZINC01278699. Sorry about this case. It will be removed in the next version.


  • I downloaded the databases Asinex and Sigma-aldrich from the version 7

of ZINC in both the formats SMILES and MOL2. For both the databases I found a difference in the molecules present in the archives, that means some molecules present in the multi-mol2 file and not in the SMILES and vice versa. Is it possible or I did some errors in the comparison?

No, you are quite correct. I just did:

>  zmore sial_p0.smi.gz | awk '{print $2}' | sort -u > smiles_codes  
>  zcat sial_p0.?.mol2.gz | grep ZINC | sort -u > mol2_p0_codes
>  wc -l smiles_codes mol2_p0_codes
114763 smiles_codes
112069 mol2_p0_codes
> diff smiles_codes  mol2_p0_codes  |wc -l
4265

I agree that there are a little over 2,500 differences in the mol2 and SMILES of Sigma Aldrich in ZINC version 7, a little over 2% of the library.