ZINC Database

From DISI
Revision as of 22:01, 20 May 2008 by Frodo (talk | contribs)
Jump to navigation Jump to search

The ZINC Database of commercially available compounds for virtual screening is available for free from our website zinc.docking.org.

Revision History

  • ZINC5 was released 1/1/2005 and is still available at blaster.docking.org/zinc5 ZINC5 was a major release, and completely superceded ZINC4, which is no longer available.
  • ZINC6 was released 1/1/2006 and is still available at blaster.docking.org/zinc6. ZINC6 was a major release, in which all molecular geometries were re-created from SMILES.
  • ZINC7 was released 1/1/2007 and is the current default version. It is an incremental release compared to ZINC6. Indeed, it looks very much like ZINC6.
  • ZINC8 is will be released in June 2008 and will probably be a major release.

Frequency of Updates

Our goal is to update ZINC regularly, at least annually. We have managed this, just, for 4 years now.

More About ZINC Versions

How to Cite ZINC

To cite ZINC, please reference Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82. This paper is also a good place to start to understand how ZINC has been assembled, and what it is good for. The rest of this document can be read as an update to that paper for things that have changed since going to press.

Frequently Asked Questions

  • duplication
  • filtering rules
  • missing molecules

Searching ZINC

The SMILES/SMARTS line below the JME applet in the ZINC Search page is interpreted as follows:

pattern - interpreted as SMARTS pattern N - molecules within Tanimoto N (0<N<100) of this SMILES. pattern N X Y - molecules within Tversky threshold N (0<N<100) having alpha=X/100, beta=Y/100. Thus:

c1ncccc1 100 0 100

matches molecules containing exactly pyridine, and

n1ccccc1 100 100 0

would match molecules that are a subgraph of pyridine.

The numbering is strange in the search results

We know about this. Sorry. We will fix it one day.

Filtering Rules

We have loaded current catalogs from all our suppliers up to 400 Daltons. Selected catalogs, such as natural products and annotated databases have been loaded to 500 and even 600 Daltons in some cases.

Molecules are not loaded if:

  • They do not pass our generous filtering criteria.
  • A failure occurs during processing.

Filtered compounds are now listed on the vendor page, with a justification.

Every compound in every supplier catalog has one of four fates:

  • It has been loaded into ZINC
  • It is depleted (not longer available) - and may still be in ZINC, marked depleted
  • It was filtered out and never loaded. These are listed as "filtered out" on the vendor pages.
  • It appeared in a newer catalog than has been loaded (version loaded is on vendors page)
  • It failed at some stage of processing.

If there is a catalog you would like to see loaded, please write to databases at docking.org.

How do subsets work?

We have pre-made a number of subsets (by vendor, by various criteria) that we hope you will find useful. When you search, you may download individual molecules or even all matching molecules in a variety of formats. However, you are limited to some number (typically around 1000, but it varies). The reason this is limited is that downloading large arbitrary collections of molecules puts a strain on our server. In order to download more, you may create a subset, by clicking on the "Create Subset" button in the "Search Results" browser. This will start an off-line subset preparation process, which will complete when resources allow (but allow at least 1 hr / 1000 molecules). It will appear in the "download subsets" (option #4) page.

Another way to create a subset, which isn't necessarily a strict subset, is to upload molecules to our server (option #5). These will again be processed in batch mode as resources permit. These subsets are not available on the browse subsets page, but instead via a URL that you will be given when you upload.

The goal of subsets is to facilitate research. You are encouraged to use these free tools. However, please note that these services are rather demanding on our server, and may be limited (or simply not function) from time to time. Thank you for your understanding.

Known Problems

I'm so glad you asked! There are a number of problems we know of, all of which we aim to fix one day. We hope you will agree that the benefits of ZINC as it stands outweigh the problems. Here are a few of the problems we are aware of:

  • Unreasonable tautomers - We generate some tautomers that we shouldn't. Among the ones we know about is CH3-C=NH -> CH2=C-N. Over half (40K+) were removed March 7. More processing to follow in April 2005. (Problem 5/3/7)
  • Aggressive protonation - We generate protonated forms that are probably unreasonable for most targets, such as protonated pyridines. This is an active area of research. Please be patient. (Problem 5/995)
  • Broken flexibase molecules - If you use the flexibase format files, we are aware of a number of broken molecules, including C1S(=O)(=O)CCC1 and molecules with aliphatic rigid fragments. We know about this, and are working to correct it. (Problem 5/996)
  • Corrupt files to download - We offer over 30 million distinct files to download from the ZINC web site. Our quality control is currently such that a few of these are corrupt. If you find one, would you kindly bring it to our attention? We will endeavor to fix it asap. (Problem 5/997)
  • Subsets & Uploads - The subsetting and uploading mechanism is somewhat brittle. We hope to spend time on this soon. (Problem 5/999)
  • Truncated Searches - We currently limit searches to 30 seconds of CPU time, to avoid overwhelming our servers, and to give you at least a partial answer in a timely fashion. We plan to add more servers soon and thereby offer quicker turnaround and more complete answers. Thanks for your patience. (Problem 5/998) [correction: We are at 45 seconds on a trial basis as of Feb 14.]
  • Duplicate backslashes in SMILES files - Fixed 3 March 2005.(Problem 5/1/3)
  • Name of molecule in mol2 file often incorrect. Being fixed currently. Reported by Gandhimathi and Federica Morandi. We consider this an annoying but not a core bug.
  • Ambiguous and syntatically incorrect SMILES for E/Z specification Being fixed currently. Full solution expected in April. (Problem 5/3/10).
  • Wrong annotations - FIXED March 1, 2005. If you find incorrect annotations in files downloaded after March 1, 2005, please write databases at docking.org. (Problem 5/3/1)
  • Wrong molecules in subset - Subsets 1&2 FIXED March 1, 2005. Other subsets being released March 3-5. If you find molecules that do not belong in a subset for files downloaded AFTER March 5, 2005, please write comments at docking.org. (Problem 5/3/2)
  • Wrong charges in mol2 files - The current version of ZINC contains MMFF94 charges rather than AMSOL charges. We regret this error. The workaround is to run mol2 files through a program like molcharge, part of the QuacPac suite from OpenEye. There are many other fine programs that will assign partial atomic charges. The new version of ZINC now in preparation will have AMSOL partial atomic charges.

Missing Molecules

ZINC Protocols

Protocol 1. Searching ZINC

Protocol 2. Downloading from ZINC

Protocol 3. Searching for similar molecules

About ZINC Numbering

Mailing Lists

Subsets

Uploading

Using the XML RPC interface

History