Category:ZINC: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
(asdf)
Line 1: Line 1:
The ZINC Database contains commercially available compounds for structure based virtual screening. It currently has about 35 million compounds that can simply be purchased.  It is provided in ready-to-dock, 3D formats with molecules represented in biologically relevant forms. It is available in subsets for general screening as well as target-, chemotype- and vendor-focused subsets.  ZINC is free for everyone to use and download at the website [http://zinc.docking.org zinc.docking.org].  This database and service is provided by the [[Shoichet Lab | Shoichet Laboratory]] in the [[Department of Pharmaceutical Chemistry]] at the [[University of California San Francisco]] (UCSF).   
The ZINC Database contains commercially available compounds for structure based virtual screening. It currently has about 90 million compounds that can simply be purchased.  It is provided in ready-to-dock, 3D formats with molecules represented in biologically relevant forms. It is available in subsets for general screening as well as target-, chemotype- and vendor-focused subsets.  ZINC is free for everyone to use and download at the website [http://zinc.docking.org zinc.docking.org].  This database and service is provided by the [[Shoichet Lab | Shoichet Laboratory]] in the [[Department of Pharmaceutical Chemistry]] at the [[University of California San Francisco]] (UCSF).   


To cite ZINC, please reference: Irwin, Sterling, Mysinger, Bolstad and Coleman,
To cite ZINC, please reference: Irwin, Sterling, Mysinger, Bolstad and Coleman,
J. Chem. Inf. Model. 2012, accepted for publication.
J. Chem. Inf. Model. 2012.
[http://pubs.acs.org/doi/abs/10.1021/ci3001277 | DOI: 10.1021/ci3001277 ]
[http://pubs.acs.org/doi/abs/10.1021/ci3001277 | DOI: 10.1021/ci3001277 ]
To cite the original ZINC paper, please reference: Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82 PDF, DOI. We thank [http://www.nigms.nih.gov/ NIGMS] for financial support (GM71896).
To cite the original ZINC paper, please reference: Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82 PDF, DOI. We thank [http://www.nigms.nih.gov/ NIGMS] for financial support (GM71896).


'''ZINC15 is in the process of being released as a new version of ZINC'''  The prior version was ZINC12.
* [[ZINC15]]
* [[ZINC12]]


{{TOCright}}
{{TOCright}}
Line 20: Line 24:


= Scope =  
= Scope =  
ZINC includes over 200 catalogs from over 100 vendors and over 40 annotated catalogs.  A list of all purchasable catalogs used is [http://zinc.docking.org/browse/catalogs/purchasable.php here].  A list of all annotated (non-purchasable) catalogs is [http://zinc.docking.org/browse/catalogs/annotated.php here].  Purchasable bioactive compounds, that is, molecules that are purchasable drawn from non-purchasable annotated catalogs are [http://zinc.docking.org/browse/catalogs/pbc.php here].  ZINC also contains catalogs of [http://zinc.docking.org/browse/catalogs/natural-products.php natural products], and natural derivatives.  
ZINC includes over 400 catalogs from over 300 vendors and over 100 annotated catalogs.  A list of all purchasable catalogs used is [http://zinc15.docking.org/catalogs/subsets/purchasable].  A list of all annotated (non-purchasable) catalogs is [http://zinc15.docking.org/catalogs/subsets/annotated].  Purchasable bioactive compounds, that is, molecules that are purchasable drawn from non-purchasable annotated catalogs are [http://zinc15.docking.org/catalogs/subsets/bioactive+purchasable].  ZINC also contains catalogs of [http://zinc15.docking.org/catalogs/subsets/biogenic+purchasable].


= Access =  
= Access =  
ZINC may be accessed at [http://zinc.docking.org/  zinc.docking.org].  ZINC is freely available to everyone to use.  Significant portions of ZINC may not be re-distributed without express written permission of John Irwin.
ZINC may be accessed at [http://zinc15.docking.org/  zinc15.docking.org].  ZINC is freely available to everyone to use.  Significant portions of ZINC may not be re-distributed without express written permission of John Irwin.


= Curation =  
= Curation =  
Line 31: Line 35:
ZINC is updated continuously. In a typical week:
ZINC is updated continuously. In a typical week:


* 40,000 new molecules are loaded
* 100,000 new molecules are loaded
* 10,000 molecules are repaired
* 10,000 molecules are repaired
* 30,000 catalog items are marked "depleted" due to their absence from the most current catalogs.
* 80,000 catalog items are marked "depleted" due to their absence from the most current catalogs.
* 5-6 vendor catalogs and 2 by-property subsets are updated.
* 3 new catalogs are added.
* 30 tranches of the 2D and 3D property subsets are updated.


= Use =  
= Use =  
ZINC is widely used. We receive an average of 5000 visits from over 500 unique visitors every day. Each month, ZINC is accessed from over 7000 unique IP addresses and between 1 and 2 TB of data are downloaded.  
ZINC is widely used. We receive over 500 unique visitors per day, 13,000 per month, and have over 50,000 "repeat customers".  


= Formats =  
= Formats =  
ZINC is available in [[SMILES]], [[mol2]], [[SDF]] and [[Flexibase Format | flexibase]] formats.  
ZINC is available in [[SMILES]], [[mol2]], [[SDF]], [[pdbqt]], and [[Flexibase Format | flexibase]] formats.  


= What is ZINC not? =
= What is ZINC not? =
Line 54: Line 59:


= Version =  
= Version =  
We differentiate the website software and the version of any particular subset that is downloaded.  The current version of the website software is 12, that became beta on Dec 1, 2011 and will be officially released on Jan 1, 2012.  Version 13, another major upgrade, is scheduled for beta testing in the second half of 2012.  For more information, please see [[ZINC:History]].
We differentiate the website software and the version of any particular subset that is downloaded.  The current version of the website software is 15, that became beta in June 2015 and will be officially released later in 2015.  For more information, please see [[ZINC:History]].


When referring to a subset that is downloaded from ZINC (for instance, "lead like" or "fragment like"), each subset has a date of preparation and a count of the number of unique molecules in the subset, which are often only approximate within about 1%.  Subsets are static version of a dynamically changing database.  Thus when referring to a ZINC subset in a publication, we recommend saying, "The lead-like subset containing 3,123,456 unique molecules generated on 20 Nov 2011 and downloaded on Dec 4, 2011"  (for instance).
When referring to a subset that is downloaded from ZINC (for instance, "lead like" or "fragment like"), each tranche has a date of preparation and a count of the number of unique molecules in the subset, which are often only approximate within about 1%.  Subsets are static version of a dynamically changing database.  Thus when referring to a ZINC subset in a publication, we recommend saying, "The lead-like subset containing 7,123,456 unique molecules was downloaded on 20 Nov 2015"  (for instance).


= Recommended usage =  
= Recommended usage =  
For most prospective docking projects, we recommend you download the "lead like" or "fragment like" subsets of ZINC in the format closest to the one used by your docking program (e.g. mol2, SDF, pdbqt).  We recommend that you download the supplier information at the same time, so that you have a permanent mapping from ZINC ID numbers to original supplier codes.  From time to time molecules disappear from ZINC, usually due to depletion, so downloading static supplier information means you do not depend on looking up compounds on the database in the future.
For most prospective docking projects, we recommend you download the "lead like" or "fragment like" subsets of ZINC in the format closest to the one used by your docking program (e.g. mol2, SDF, pdbqt).  We recommend that you download the supplier information at the same time, so that you have a permanent mapping from ZINC ID numbers to original supplier codes.  From time to time molecules disappear from ZINC, usually due to depletion, so downloading static supplier information means you do not depend on looking up compounds on the database in the future.
There are three variations of "leads" and "fragments":
*  The standard definition (subsets 1 and 2 respectively)
*  "clean" leads and fragments (subsets 11 and 12 respectively) - these have had compounds that some people think are problematic removed.
*  "immediate availability" leads and fragments, (subsets 21 and 22 respectively).


There are other applications and uses of ZINC, but this is in our view by far the most common and useful application.
There are other applications and uses of ZINC, but this is in our view by far the most common and useful application.


= Which suppliers are included? =  
= Which suppliers are included? =  
Please see the  
Please see the [http://zinc15docking.org/catalogs] page. If there is a catalog you would like to see loaded, please write to databases at docking.org.
[http://zinc.docking.org/browse/catalogs/ Subsets->Catalog] page. If there is a catalog you would like to see loaded, please write to databases at docking.org.


= Filtering rules =  
= Filtering rules =  
Line 76: Line 75:


= External Links =  
= External Links =  
* [http://zinc.docking.org ZINC database]
* [http://zinc15.docking.org ZINC database]
* [http://zincpharmer.csb.pitt.edu ZINCPharmer] An online pharmacophore search tool from the University of Pittsburgh
* [http://zincpharmer.csb.pitt.edu ZINCPharmer] An online pharmacophore search tool from the University of Pittsburgh
* [http://facebook.com/zincdb ZINC Facebook page contains links to sites that use ZINC]
* [http://facebook.com/zincdb ZINC Facebook page contains links to sites that use ZINC]


= Problems with ZINC =  
= Problems with ZINC =  
We have collected all our problems we are aware of on the [[problems]] page.  If you have a problem with ZINC not included there, please write support at docking.org.
ZINC has many problems. Please see our [[feedback]] page on how to tell us about problems.  


= How do subsets work? =
= How do subsets work? =
We have pre-made subsets ([http://zinc.docking.org/browse/catalogs/all.php by vendor], by [http://zinc.docking.org/browse/subsets/ physical properties], and  
We have pre-made subsets ([http://zinc15.docking.org/catalogs], by [http://zinc15.docking.org/tranches], and  
[http://zinc.docking.org/browse/subsets/special.php special properties])  
[http://zinc.docking.org/browse/subsets/special.php special properties])  
that we hope you find useful and may well satisfy many if not most of your needs.
that we hope you find useful and may well satisfy many if not most of your needs.

Revision as of 06:57, 3 June 2015

The ZINC Database contains commercially available compounds for structure based virtual screening. It currently has about 90 million compounds that can simply be purchased. It is provided in ready-to-dock, 3D formats with molecules represented in biologically relevant forms. It is available in subsets for general screening as well as target-, chemotype- and vendor-focused subsets. ZINC is free for everyone to use and download at the website zinc.docking.org. This database and service is provided by the Shoichet Laboratory in the Department of Pharmaceutical Chemistry at the University of California San Francisco (UCSF).

To cite ZINC, please reference: Irwin, Sterling, Mysinger, Bolstad and Coleman, J. Chem. Inf. Model. 2012. | DOI: 10.1021/ci3001277

To cite the original ZINC paper, please reference: Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82 PDF, DOI. We thank NIGMS for financial support (GM71896).

ZINC15 is in the process of being released as a new version of ZINC The prior version was ZINC12.

Purpose

ZINC was originally designed for target based virtual screening (docking), and this remains its primary focus. However, ZINC is also useful for many other things, including:

  • finding a compound to purchase
  • downloading a library in SMILES format for ligand based virtual screening
  • find compounds by similarity to a starting compound (SAR-by-catalog)
  • find compound ANNOTATED for a particular target (via ChEMBL)
  • find compounds PREDICTED for a particular target (via SEA / ChEMBL or docking)
  • and much more...

Scope

ZINC includes over 400 catalogs from over 300 vendors and over 100 annotated catalogs. A list of all purchasable catalogs used is [1]. A list of all annotated (non-purchasable) catalogs is [2]. Purchasable bioactive compounds, that is, molecules that are purchasable drawn from non-purchasable annotated catalogs are [3]. ZINC also contains catalogs of [4].

Access

ZINC may be accessed at zinc15.docking.org. ZINC is freely available to everyone to use. Significant portions of ZINC may not be re-distributed without express written permission of John Irwin.

Curation

ZINC is curated by the ZINC Curators. This group works to improve and maintain the database, and to keep it as current as possible.

Updates

ZINC is updated continuously. In a typical week:

  • 100,000 new molecules are loaded
  • 10,000 molecules are repaired
  • 80,000 catalog items are marked "depleted" due to their absence from the most current catalogs.
  • 3 new catalogs are added.
  • 30 tranches of the 2D and 3D property subsets are updated.

Use

ZINC is widely used. We receive over 500 unique visitors per day, 13,000 per month, and have over 50,000 "repeat customers".

Formats

ZINC is available in SMILES, mol2, SDF, pdbqt, and flexibase formats.

What is ZINC not?

Not every molecule for sale

It focuses on biologically relevant compounds. To achieve this focus, ZINC filters out molecules widely considered unsuitable for docking, such as peroxides, big insoluble molecules, large peptides, and highly reactive reagents including many building blocks. See our filtering rules. We also filter out molecules containing metals, boron, and silicon, because there are no MMFF94 parameters for these atoms.

Not all drugs

Some filtered compounds, such as cis-platin, are actually drugs or drug candidates. Our focus is on virtual screening and molecular modeling, something we do not know how to do with cis-platin, or boronic acids. For general purpose purchasing, we suggest chemspider.com or emolecules.com.

Not encyclopedic

ZINC is single minded in its focus on biologically relevant representations of molecules. It does not keep track of many other kinds of information, such as CAS numbers, or even names. For these, please try ChemDB, PubChem, drugbank.ca, biocyc/metacyc, ChEMBL, chemspider.com, along with many others.

Version

We differentiate the website software and the version of any particular subset that is downloaded. The current version of the website software is 15, that became beta in June 2015 and will be officially released later in 2015. For more information, please see ZINC:History.

When referring to a subset that is downloaded from ZINC (for instance, "lead like" or "fragment like"), each tranche has a date of preparation and a count of the number of unique molecules in the subset, which are often only approximate within about 1%. Subsets are static version of a dynamically changing database. Thus when referring to a ZINC subset in a publication, we recommend saying, "The lead-like subset containing 7,123,456 unique molecules was downloaded on 20 Nov 2015" (for instance).

Recommended usage

For most prospective docking projects, we recommend you download the "lead like" or "fragment like" subsets of ZINC in the format closest to the one used by your docking program (e.g. mol2, SDF, pdbqt). We recommend that you download the supplier information at the same time, so that you have a permanent mapping from ZINC ID numbers to original supplier codes. From time to time molecules disappear from ZINC, usually due to depletion, so downloading static supplier information means you do not depend on looking up compounds on the database in the future.

There are other applications and uses of ZINC, but this is in our view by far the most common and useful application.

Which suppliers are included?

Please see the [5] page. If there is a catalog you would like to see loaded, please write to databases at docking.org.

Filtering rules

Filtering Rules

External Links

Problems with ZINC

ZINC has many problems. Please see our feedback page on how to tell us about problems.

How do subsets work?

We have pre-made subsets ([6], by [7], and special properties) that we hope you find useful and may well satisfy many if not most of your needs.

We also offer subsets by target, annotated by binding, functional activity, or ADME/T activity via ChEMBL.

We also allow you to download the results of any search, up to 1000 molecules. You can also collect molecules in the shopping cart, and then download those in any format we support.

You may also upload your own molecules, either as SMILES of ZINC IDs, into a shopping cart, and then download those in any format we support.

You may also create decoys based on the molecules in the shopping cart, and download those.

About ZINC subsets

See Also