ZINC Database: Difference between revisions

From DISI
Jump to navigation Jump to search
mNo edit summary
 
mNo edit summary
Line 7: Line 7:


= Scope =  
= Scope =  
ZINC includes molecules from commercially available catalogs as well as some annotated databases. A list of all catalogs used is [http://zinc.docking.org/vendor0/ | here].  A list of annotated (non-purchasable) catalogs is [http://zinc.docking.org/vendor0/index_nfs.shtml | here].  Molecules that are purchasable drawn from non-purchasable catalogs are [http://zinc.docking.org/vendor3/ | here].
ZINC includes molecules from commercially available catalogs as well as some annotated databases. A list of all catalogs used is [http://zinc.docking.org/vendor0/ here].  A list of annotated (non-purchasable) catalogs is [http://zinc.docking.org/vendor0/index_nfs.shtml here].  Molecules that are purchasable drawn from non-purchasable catalogs are [http://zinc.docking.org/vendor3/ here].


= Access =  
= Access =  

Revision as of 20:42, 23 September 2011

The ZINC Database of commercially available compounds for structure based virtual screening. It contains about 20 million compounds that can simply be purchased. It is provided in ready-to-dock, 3D formats. It is free for everyone to use and download at the website zinc.docking.org. ZINC is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California San Francisco (UCSF). To cite ZINC, please reference: Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82 PDF, DOI. We thank NIGMS for financial support (GM71896).

Purpose

ZINC is designed for target based virtual screening (docking), but it is also used for ligand based virtual screening and other computational drug discovery approaches.

Scope

ZINC includes molecules from commercially available catalogs as well as some annotated databases. A list of all catalogs used is here. A list of annotated (non-purchasable) catalogs is here. Molecules that are purchasable drawn from non-purchasable catalogs are here.

Access

ZINC may be accessed at | zinc.docking.org. It is freely available to everyone to use. However, significant portions of ZINC may not be re-distributed without written permission of the ZINC Curators.

Curation

ZINC is curated by the ZINC Curators. This group works to improve and maintain the database, and to keep it as current as possible.

Updates

ZINC is updated continuously. Currently, about 40,000 new molecules are loaded each week, about 10,000 molecular representations are fixed in some way, 30,000 catalog items are removed from ZINC due to their absence from the most current catalogs. Usually this means a vendor has run out of stock, and does not intend to re-synthesize unless asked.

Formats

ZINC is available in SMILES, mol2, SDF and flexibase formats.

What is ZINC not suitable for?

ZINC filters out molecules thought to be unsuitable for docking, such as peroxides, big insoluble molecules, large peptides, and highly reactive reagents. Some of these can actually be drugs, and thus ZINC is consciously not a superset of all drugs. For general purpose purchasing, we suggest chemspider.com or emolecules.com. ZINC is single minded in its focus on biologically relevant representations of molecules. It does not keep track of many other kinds of information, such as CAS numbers, or even names. For these, please try ChemDB, PubChem, drugbank.ca, biocyc/metacyc, ChEMBL, chemspider.com, along with many others.

Version

We differentiate the website software and the version of any particular subset that is downloaded. The current version of the website software is 11. Version 12, a complete re-write with many new features, is now in alpha test (May 2011), and should be released later in 2011. Please see ZINC:History.

When referring to a subset that is downloaded from ZINC (for instance, "lead like" or "fragment like"), each subset has a date of preparation and a count of the number of unique molecules in the subset, which are often only approximate within 1%. Subsets are static version of a dynamically changing database.

Recommended usage

We recommend you download the "lead like" or "fragment like" subsets of ZINC in the format closest to the one used by your docking program (e.g. mol2, SDF, pdbqt). We recommend that you download the supplier information at the same time, so that you have a permanent mapping from ZINC ID numbers to original supplier codes. From time to time molecules disappear from ZINC, usually due to depletion, so downloading static supplier information means you do not depend on looking up compounds on the database in the future.

There are three variations of "leads" and "fragments":

  • The standard definition (subsets 1 and 2 respectively)
  • "clean" leads and fragments (subsets 11 and 12 respectively) - these have had compounds that some people think are problematic removed.
  • "immediate availability" leads and fragments, (subsets 21 and 22 respectively).

There are other applications and uses of ZINC, but this is in our view by far the most common and useful application.

Which suppliers are included?

Please see the by-vendor page. If there is a catalog you would like to see loaded, please write to databases at docking.org.


Filtering rules

To see the filtering rules in effect in ZINC, please see filtering.docking.org. We filter both at load time and also at subset preparation time. Molecules are not loaded if:

  • They do not pass our load-time filtering criteria
  • An error occurs during processing.

Each filtered-out compound is justified with a reason, available via the vendor page. Every compound in every supplier catalog we load has one of four fates:

  • It loaded and visible in ZINC.
  • It is depleted (not longer available) - and may still be in ZINC, marked depleted. Depleted compounds may not persist for long.
  • It was filtered out and never loaded. These are listed as "filtered out" on the vendor pages.
  • It failed at some stage of processing, and may appear nowhere except in the original vendor catalog.

External Links

Problems with ZINC

We have collected all our problems we are aware of on the problems page. If you have a problem with ZINC not included there, please write support at docking.org.

How do subsets work?

We have pre-made a number of subsets (by vendor, by various criteria) that we hope you will find useful. When you search, you may download individual molecules or even all matching molecules in a variety of formats. However, you are limited to some number (typically around 1000, but it varies). The reason this is limited is that downloading large arbitrary collections of molecules puts a strain on our server. In order to download more, you may create a subset, by clicking on the "Create Subset" button in the "Search Results" browser. This will start an off-line subset preparation process, which will complete when resources allow (but allow at least 1 hr / 1000 molecules). It will appear in the "download subsets" (option #4) page.

Another way to create a subset, which isn't necessarily a strict subset, is to upload molecules to our server (option #5). These will again be processed in batch mode as resources permit. These subsets are not available on the browse subsets page, but instead via a URL that you will be given when you upload.

The goal of subsets is to facilitate research. You are encouraged to use these free tools. However, please note that these services are rather demanding on our server, and may be limited (or simply not function) from time to time. Thank you for your understanding.

FAQ: ZINC numbers, mailing lists, uploading, XML RPC,