ZINC Database: Difference between revisions

From DISI
Jump to navigation Jump to search
m (Reverted edits by Paola mybhaby (Talk); changed back to last version by Frodo)
(No difference)

Revision as of 16:14, 6 May 2010

The ZINC Database of commercially available compounds for structure based virtual screening is free for everyone to use at the website zinc.docking.org.

What is ZINC designed for?

ZINC is designed for structure based virtual screening also known as docking screens. For this reason, we try to pay attention to the representation of the molecules.

ZINC may also be useful for other applications, such as in chemical informatics, particularly as SMILES.

What is ZINC not suitable for?

ZINC is not a comprehensive inventory of all purchasable compounds, since we filter out those that may be problematic in biochemical assays. For general purpose purchasing, we suggest chemspider.com or emolecules.com. ZINC does not attempt to keep track of CAS numbers, or biological activities. For the former, try CAS. For the latter try ChemDB, PubChem, or drugbank.ca, for instance.

Recommended usage

We recommend you download the "lead like" or "fragment like" subsets of ZINC in the format closest to the one used by your docking program (e.g. mol2, SDF, pdbqt). We recommend that you download the supplier information at the same time, so that you have a permanent mapping from ZINC ID numbers to original supplier codes. From time to time molecules disappear from ZINC, usually due to depletion, so downloading static supplier information means you do not depend on looking up compounds on the database in the future.

There are three variations of "leads" and "fragments". The standard definition (subsets 1 and 2 respectively); "clean" leads and fragments (subsets 11 and 12 respectively); "immediate availability" leads and fragments, (subsets 21 and 22 repectively).

There are other applications and uses of ZINC, but this is in our view by far the most common and useful application.

How to Cite ZINC

Please reference Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82. This paper is also a good place to start to understand how ZINC has been assembled, and what it is good for. The rest of this document can be read as an update to that paper for things that have changed since going to press.

Filtering rules

To see the filtering rules in effect in ZINC, please see filtering.docking.org.

We filter both at load time and also at subset preparation time. Molecules are not loaded if:

  • They do not pass our load-time filtering criteria
  • An error occurs during processing.

Each filtered-out compound is justified with a reason, available via the vendor page. Every compound in every supplier catalog we load has one of four fates:

  • It loaded and visible in ZINC.
  • It is depleted (not longer available) - and may still be in ZINC, marked depleted. Depleted compounds may not persist for long.
  • It was filtered out and never loaded. These are listed as "filtered out" on the vendor pages.
  • It failed at some stage of processing, and may appear nowhere except in the original vendor catalog.

Which suppliers are included?

Please see the by-vendor page. If there is a catalog you would like to see loaded, please write to databases at docking.org.

Update frequency

  • New molecules are loaded into ZINC continuously. We foresee loading up to 10,000 new molecules every day, at least until 2012.
  • Old molecules are marked depleted in ZINC when they are removed from a supplier's catalog. For some suppliers, this is every day, for others, more like once a year. More than 50% of supplier catalogs are updated in ZINC at least twice a year.
  • Ready-to-download subsets are updated periodically. The last date of export is shown with the subset. If you find a too-old subset, tell us and we will update it.
  • Various aspects of ZINC are updated from time to time. Thus the index for similarity searching, pre-calculated similarities, SEA- and literature-originated annotations, and yuck filters are all updated from time to time. At any one time any of these may be weeks or even months out of date. One day we will figure out how to keep this data more current.

Searching ZINC

The SMILES/SMARTS line below the JME applet in the ZINC Search page is interpreted as follows:

pattern - interpreted as SMARTS pattern N - molecules within Tanimoto N (0<N<100) of this SMILES. pattern N X Y - molecules within Tversky threshold N (0<N<100) having alpha=X/100, beta=Y/100. Thus:

c1ncccc1 100 0 100

matches molecules containing exactly pyridine, and

n1ccccc1 100 100 0

would match molecules that are a subgraph of pyridine.

The numbering is strange in the search results

We know about this. Sorry. We will fix it one day.

Problems with ZINC

We have collected all our problems we are aware of on the problems page. If you have a problem with ZINC not included there, please write support at docking.org.

How do subsets work?

We have pre-made a number of subsets (by vendor, by various criteria) that we hope you will find useful. When you search, you may download individual molecules or even all matching molecules in a variety of formats. However, you are limited to some number (typically around 1000, but it varies). The reason this is limited is that downloading large arbitrary collections of molecules puts a strain on our server. In order to download more, you may create a subset, by clicking on the "Create Subset" button in the "Search Results" browser. This will start an off-line subset preparation process, which will complete when resources allow (but allow at least 1 hr / 1000 molecules). It will appear in the "download subsets" (option #4) page.

Another way to create a subset, which isn't necessarily a strict subset, is to upload molecules to our server (option #5). These will again be processed in batch mode as resources permit. These subsets are not available on the browse subsets page, but instead via a URL that you will be given when you upload.

The goal of subsets is to facilitate research. You are encouraged to use these free tools. However, please note that these services are rather demanding on our server, and may be limited (or simply not function) from time to time. Thank you for your understanding.

Versions

Please see ZINC:History.


FAQ: ZINC numbers, mailing lists, uploading, XML RPC,