ZINC Database: Difference between revisions

From DISI
Jump to navigation Jump to search
Line 34: Line 34:


= What is ZINC not suitable for? =
= What is ZINC not suitable for? =
ZINC filters out molecules thought to be unsuitable for docking, such as peroxides, big insoluble molecules, large peptides, and highly reactive reagents. Some of these can actually be drugs, and thus ZINC is consciously not a superset of all drugs.  For general purpose purchasing, we suggest chemspider.com or emolecules.com. ZINC is single minded in its focus on biologically relevant representations of molecules.  It does not keep track of many other kinds of information, such as CAS numbers, or even names.  For these, please try ChemDB, PubChem, drugbank.ca, biocyc/metacyc, ChEMBL, chemspider.com, along with many others.  
 
ZINC is not a comprehensive collection of every molecule for sale. It focuses on biologically relevant compounds.  To achieve this focus, ZINC filters out molecules widely considered unsuitable for docking, such as peroxides, big insoluble molecules, large peptides, and highly reactive reagents. See our [http://filtering.docking.org filtering rules]. We also filter out molecules containing metals, boron, and silicon, because there are no MMFF94 parameters for these atoms.
 
Some filtered compounds, such as cis-platin, are actually drugs or drug candidates.  Thus ZINC is very consciously not a superset of all drugs.  For general purpose purchasing, we suggest [http://chemspider.com chemspider.com] or [http://emolecules.com emolecules.com].
 
ZINC is single minded in its focus on biologically relevant representations of molecules. ZINC is not and does not want to be encyclopedic.  It does not keep track of many other kinds of information, such as CAS numbers, or even names.  For these, please try ChemDB, PubChem, drugbank.ca, biocyc/metacyc, ChEMBL, chemspider.com, along with many others.


= Version =  
= Version =  

Revision as of 00:33, 23 November 2011

The ZINC Database of commercially available compounds for structure based virtual screening. It contains about 14 million compounds that can simply be purchased. It is provided in ready-to-dock, 3D formats with molecules represented in biologically relevant forms. It is free for everyone to use and download at the website zinc.docking.org. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California San Francisco (UCSF). To cite ZINC, please reference: Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82 PDF, DOI. We thank NIGMS for financial support (GM71896).

Purpose

ZINC was originally designed for target based virtual screening (docking), and this remains its primary focus. However, ZINC is also useful for many other things, including:

  • simply finding a compound to purchase
  • downloading a library in SMILES format for ligand based virtual screening
  • find compounds by similarity to a starting compound (SAR-by-catalog)
  • find compound ANNOTATED for a particular target (via ChEMBL)
  • find compounds PREDICTED for a particular target (via SEA / ChEMBL)
  • and many more...

Scope

ZINC includes molecules from over 130 commercial vendor catalogs and over 20 annotated databases. A list of all purchasable catalogs used is here. A list of all annotated (non-purchasable) catalogs is here. Purchasable bioactive compounds, that is, molecules that are purchasable drawn from non-purchasable annotated catalogs are here.

Access

ZINC may be accessed at zinc.docking.org. ZINC is freely available to everyone to use. However, significant portions of ZINC may not be re-distributed without express written permission of John Irwin.

Curation

ZINC is curated by the ZINC Curators. This group works to improve and maintain the database, and to keep it as current as possible.

Updates

ZINC is updated continuously. Each week:

  • 40,000 new molecules are loaded
  • 10,000 molecules are repaired in some way
  • 30,000 catalog items are marked "depleted" due to their absence from the most current catalogs.
  • 5-6 vendor catalogs and 2 by-property subsets are updated.

Formats

ZINC is available in SMILES, mol2, SDF and flexibase formats.

What is ZINC not suitable for?

ZINC is not a comprehensive collection of every molecule for sale. It focuses on biologically relevant compounds. To achieve this focus, ZINC filters out molecules widely considered unsuitable for docking, such as peroxides, big insoluble molecules, large peptides, and highly reactive reagents. See our filtering rules. We also filter out molecules containing metals, boron, and silicon, because there are no MMFF94 parameters for these atoms.

Some filtered compounds, such as cis-platin, are actually drugs or drug candidates. Thus ZINC is very consciously not a superset of all drugs. For general purpose purchasing, we suggest chemspider.com or emolecules.com.

ZINC is single minded in its focus on biologically relevant representations of molecules. ZINC is not and does not want to be encyclopedic. It does not keep track of many other kinds of information, such as CAS numbers, or even names. For these, please try ChemDB, PubChem, drugbank.ca, biocyc/metacyc, ChEMBL, chemspider.com, along with many others.

Version

We differentiate the website software and the version of any particular subset that is downloaded. The current version of the website software is 11. Version 12, a complete re-write with many new features, is now in alpha test (May 2011), and should be released later in 2011. Please see ZINC:History.

When referring to a subset that is downloaded from ZINC (for instance, "lead like" or "fragment like"), each subset has a date of preparation and a count of the number of unique molecules in the subset, which are often only approximate within 1%. Subsets are static version of a dynamically changing database.

Recommended usage

We recommend you download the "lead like" or "fragment like" subsets of ZINC in the format closest to the one used by your docking program (e.g. mol2, SDF, pdbqt). We recommend that you download the supplier information at the same time, so that you have a permanent mapping from ZINC ID numbers to original supplier codes. From time to time molecules disappear from ZINC, usually due to depletion, so downloading static supplier information means you do not depend on looking up compounds on the database in the future.

There are three variations of "leads" and "fragments":

  • The standard definition (subsets 1 and 2 respectively)
  • "clean" leads and fragments (subsets 11 and 12 respectively) - these have had compounds that some people think are problematic removed.
  • "immediate availability" leads and fragments, (subsets 21 and 22 respectively).

There are other applications and uses of ZINC, but this is in our view by far the most common and useful application.

Which suppliers are included?

Please see the by-vendor page. If there is a catalog you would like to see loaded, please write to databases at docking.org.


Filtering rules

To see the filtering rules in effect in ZINC, please see filtering.docking.org. We filter both at load time and also at subset preparation time. Molecules are not loaded if:

  • They do not pass our load-time filtering criteria
  • An error occurs during processing.

Each filtered-out compound is justified with a reason, available via the vendor page. Every compound in every supplier catalog we load has one of four fates:

  • It loaded and visible in ZINC.
  • It is depleted (not longer available) - and may still be in ZINC, marked depleted. Depleted compounds may not persist for long.
  • It was filtered out and never loaded. These are listed as "filtered out" on the vendor pages.
  • It failed at some stage of processing, and may appear nowhere except in the original vendor catalog.

External Links

Problems with ZINC

We have collected all our problems we are aware of on the problems page. If you have a problem with ZINC not included there, please write support at docking.org.

How do subsets work?

We have pre-made a number of subsets (by vendor, by various criteria) that we hope you will find useful. When you search, you may download individual molecules or even all matching molecules in a variety of formats. However, you are limited to some number (typically around 1000, but it varies). The reason this is limited is that downloading large arbitrary collections of molecules puts a strain on our server. In order to download more, you may create a subset, by clicking on the "Create Subset" button in the "Search Results" browser. This will start an off-line subset preparation process, which will complete when resources allow (but allow at least 1 hr / 1000 molecules). It will appear in the "download subsets" (option #4) page.

Another way to create a subset, which isn't necessarily a strict subset, is to upload molecules to our server (option #5). These will again be processed in batch mode as resources permit. These subsets are not available on the browse subsets page, but instead via a URL that you will be given when you upload.

The goal of subsets is to facilitate research. You are encouraged to use these free tools. However, please note that these services are rather demanding on our server, and may be limited (or simply not function) from time to time. Thank you for your understanding.

FAQ: ZINC numbers, mailing lists, uploading, XML RPC,