ZINC:Problems: Difference between revisions

From DISI
Jump to navigation Jump to search
mNo edit summary
 
mNo edit summary
Line 5: Line 5:
* [[DOCK:Problems]]
* [[DOCK:Problems]]
* [[Problems]] - all other problems go here.  
* [[Problems]] - all other problems go here.  
* Molecule duplication
* Incorrect representation
* Missing representation
* Out of date catalogs
* Broken molecules
* Probably incorrect enumeration/sampling of stereochemistry both R/S and E/Z
* incorrect treatment of protonation and tautomerization in some cases
* search way too slow
* SEA index out of date
* activity annotations out of date
* links to supplier web sites sometimes broken
* tutorials
* protocols
* describe the pipeline on-line
* get the new paper out
* yuck out of date.
* missing molecules
== Known Problems ==
I'm so glad you asked! There are a number of problems we know of, all of which we aim to fix one day. We hope you will agree that the benefits of ZINC as it stands outweigh the problems. Here are a few of the problems we are aware of:
* Unreasonable tautomers - We generate some tautomers that we shouldn't. Among the ones we know about is CH3-C=NH -> CH2=C-N. Over half (40K+) were removed March 7. More processing to follow in April 2005. (Problem 5/3/7)
* Aggressive protonation - We generate protonated forms that are probably unreasonable for most targets, such as protonated pyridines. This is an active area of research. Please be patient. (Problem 5/995)
* Broken flexibase molecules - If you use the flexibase format files, we are aware of a number of broken molecules, including C1S(=O)(=O)CCC1 and molecules with aliphatic rigid fragments. We know about this, and are working to correct it. (Problem 5/996)
* Corrupt files to download - We offer over 30 million distinct files to download from the ZINC web site. Our quality control is currently such that a few of these are corrupt. If you find one, would you kindly bring it to our attention? We will endeavor to fix it asap. (Problem 5/997)
* Subsets & Uploads - The subsetting and uploading mechanism is somewhat brittle. We hope to spend time on this soon. (Problem 5/999)
* Truncated Searches - We currently limit searches to 30 seconds of CPU time, to avoid overwhelming our servers, and to give you at least a partial answer in a timely fashion. We plan to add more servers soon and thereby offer quicker turnaround and more complete answers. Thanks for your patience. (Problem 5/998) [correction: We are at 45 seconds on a trial basis as of Feb 14.]
* Duplicate backslashes in SMILES files - Fixed 3 March 2005.(Problem 5/1/3)
* Name of molecule in mol2 file often incorrect. Being fixed currently. Reported by Gandhimathi and Federica Morandi. We consider this an annoying but not a core bug.
* Ambiguous and syntatically incorrect SMILES for E/Z specification Being fixed currently. Full solution expected in April. (Problem 5/3/10).
* Wrong annotations - FIXED March 1, 2005. If you find incorrect annotations in files downloaded after March 1, 2005, please write databases at docking.org. (Problem 5/3/1)
* Wrong molecules in subset - Subsets 1&2 FIXED March 1, 2005. Other subsets being released March 3-5. If you find molecules that do not belong in a subset for files downloaded AFTER March 5, 2005, please write comments at docking.org. (Problem 5/3/2)
* Wrong charges in mol2 files - The current version of ZINC contains MMFF94 charges rather than AMSOL charges. We regret this error. The workaround is to run mol2 files through a program like molcharge, part of the QuacPac suite from OpenEye. There are many other fine programs that will assign partial atomic charges. The new version of ZINC now in preparation will have AMSOL partial atomic charges.
== Missing Molecules ==




[[Category:Problems]]
[[Category:Problems]]

Revision as of 16:00, 7 August 2009

This is the ZINC problems page, which describes all the problems specific to ZINC. There are other problems pages:


  • Molecule duplication
  • Incorrect representation
  • Missing representation
  • Out of date catalogs
  • Broken molecules
  • Probably incorrect enumeration/sampling of stereochemistry both R/S and E/Z
  • incorrect treatment of protonation and tautomerization in some cases
  • search way too slow
  • SEA index out of date
  • activity annotations out of date
  • links to supplier web sites sometimes broken
  • tutorials
  • protocols
  • describe the pipeline on-line
  • get the new paper out
  • yuck out of date.
  • missing molecules

Known Problems

I'm so glad you asked! There are a number of problems we know of, all of which we aim to fix one day. We hope you will agree that the benefits of ZINC as it stands outweigh the problems. Here are a few of the problems we are aware of:

  • Unreasonable tautomers - We generate some tautomers that we shouldn't. Among the ones we know about is CH3-C=NH -> CH2=C-N. Over half (40K+) were removed March 7. More processing to follow in April 2005. (Problem 5/3/7)
  • Aggressive protonation - We generate protonated forms that are probably unreasonable for most targets, such as protonated pyridines. This is an active area of research. Please be patient. (Problem 5/995)
  • Broken flexibase molecules - If you use the flexibase format files, we are aware of a number of broken molecules, including C1S(=O)(=O)CCC1 and molecules with aliphatic rigid fragments. We know about this, and are working to correct it. (Problem 5/996)
  • Corrupt files to download - We offer over 30 million distinct files to download from the ZINC web site. Our quality control is currently such that a few of these are corrupt. If you find one, would you kindly bring it to our attention? We will endeavor to fix it asap. (Problem 5/997)
  • Subsets & Uploads - The subsetting and uploading mechanism is somewhat brittle. We hope to spend time on this soon. (Problem 5/999)
  • Truncated Searches - We currently limit searches to 30 seconds of CPU time, to avoid overwhelming our servers, and to give you at least a partial answer in a timely fashion. We plan to add more servers soon and thereby offer quicker turnaround and more complete answers. Thanks for your patience. (Problem 5/998) [correction: We are at 45 seconds on a trial basis as of Feb 14.]
  • Duplicate backslashes in SMILES files - Fixed 3 March 2005.(Problem 5/1/3)
  • Name of molecule in mol2 file often incorrect. Being fixed currently. Reported by Gandhimathi and Federica Morandi. We consider this an annoying but not a core bug.
  • Ambiguous and syntatically incorrect SMILES for E/Z specification Being fixed currently. Full solution expected in April. (Problem 5/3/10).
  • Wrong annotations - FIXED March 1, 2005. If you find incorrect annotations in files downloaded after March 1, 2005, please write databases at docking.org. (Problem 5/3/1)
  • Wrong molecules in subset - Subsets 1&2 FIXED March 1, 2005. Other subsets being released March 3-5. If you find molecules that do not belong in a subset for files downloaded AFTER March 5, 2005, please write comments at docking.org. (Problem 5/3/2)
  • Wrong charges in mol2 files - The current version of ZINC contains MMFF94 charges rather than AMSOL charges. We regret this error. The workaround is to run mol2 files through a program like molcharge, part of the QuacPac suite from OpenEye. There are many other fine programs that will assign partial atomic charges. The new version of ZINC now in preparation will have AMSOL partial atomic charges.

Missing Molecules