ZINC:FAQ: Difference between revisions

From DISI
Jump to navigation Jump to search
mNo edit summary
No edit summary
 
(5 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Here are frequently asked questions about ZINC.
Frequently asked questions about ZINC. [[:Category:FAQ |Other FAQs]]
{{TOCright}}


* [[DOCK:FAQ]]
= What is the version (how to cite) ? =
* [[DUD:FAQ]]
The interface is version 12, released in January 2012.  The next version will be in January 2013, and a beta will appear in the fall.  
* [[DOCK Blaster:FAQ]]
* [[THC:FAQ]]
* [[FAQ]] for everything not covered by one of those products.


== How do I get arbitrary subsets of ZINC? ==
The subsets of ZINC are updated regularly.  Each subset has a release date and the total number of molecules.  We recommend you quote these to specify which version you used.


Q1. I am trying to generate a subset of your "drug-like" molecule subset for
Thus you might say:  We used ZINC version 12 (Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82).  We downloaded the standard lead-like subset of 4,554,059 molecules dated 2012-02-06 on February 20, 2012.
virtual screening. I was thinking your 60% diversity group (about 12,000
molecules) would be a place to start, and I downloaded the .smi fileI
relatively new to chemoinformatics and I was wondering if there is an
elegant way to separate the compounds listed in the .smi file from the
larger library containing the mol2 files from the 2,000,000 "usual" set that
I have downloaded from ZINC?


A1.
We recommend that you always download the "Purchasing Info" in addition to the actual molecule structure files every time you download ZINCThis provides you with static purchasing information, in case in the meantime the molecules goes off the market and becomes unavailable.
  wget http://zinc8.docking.org/subset1/3/3_t60.smi
awk '{print $2}' 3_t60.smi >! codes
sed -e 's/^/fget2.pl?f=m\&l=0\&z=/' codes  > codes2
wget -O all.mol2 -a listing  -B http://zinc8.docking.org/ -i codes2


= What is the purpose of ZINC? =
The purpose of ZINC is to simplify ligand discovery for biology.  A primary aim is screening library for molecular docking, and also for chemical informatics.  ZINC aims to be current with correct purchasing information, to enable research.


l indicates the pH model. 0=reference (pH 7), 1=mid (5.75-8.25), 2=hi (7-8.5), 3=lo (4.5-6)
= What can I do with ZINC? =  
You can use ZINC to make your life easier. You can use it for docking, for cheminformatics, for looking up molecules, for finding analogs by similarity, and for simply exploring vendors and the compounds they sell.


= What sort of people is ZINC designed for? =
ZINC is aimed at professional researchers seeking new molecules to test in biological experiments. It is used by graduate students, postdoctoral fellows and established researchers in pharmaceutical companies, in universities, in government research labs, and in biotechs and startup companies.  ZINC is free to use for everyone.


== I want a hierarchy format database based on ZINC IDs. ==
= How do I download ZINC? on Windows? =  
Every subset page and every vendor page has a special link for downloading the subset on Windows and on Unix, which includes Macs. We have a You Tube video which explains how to do this:  http://youtube.com/user/chemistry4biology


A2.
= Do I want usual or single or something else? =  
  create file "hits.txt" containing one ZINC ID per row.
If your docking program calculates protonation states and tautomers on the fly, then you want single. If you are doing cheminformatics, you probably want your SMILES as single. If you are doing molecular docking, you probably want usual, which includes relevant molecular forms between pH 6 and 8. If you are docking to metalloenzymes you want the metal subsets, which includes additional high pH forms.
  sed -e 's/^/fget2.pl?f=h\&l=0\&z=/' hits.txt > ref.txt
  wget -O ref.db -a listing -B http://zinc.docking.org/ -i ref.txt
The previous line gets the "reference" (pH 7) models. For additional "usual" forms, use l=1.
  sed -e 's/^/fget2.pl?f=h\&l=1\&z=/' hits.txt > mid.txt
  wget -O mid.db -a listing -B http://zinc.docking.org/ -i mid.txt
Note we recommend splitting hits.txt into sets of 1000 ZINC IDs each, thus:
  split -l hits.txt
  foreach i (x??)
    sed ...
    wget ...
  end
Please let us know if this is not clear!


= Do I want lead-like or fragment-like or drug-like or something else? =
If the binding site is very small, or if you are using a biophysical assay such as NMR or SPR, then you may want fragment-like.  In most other cases, lead-like is the best choice for discovery projects.


== Is there a script that does all of this? ==
= Do I want standard, clean or now subsets? =
A3Yes, Peter Kolb wrote one (thanks Peter). Download it here....
Clean: If you can tollerate a high false positive rate and you want more chemistry, you probably want a standard subsetIf you are new to screening or unsure about what you are doing, a clean subset may be a better choice. Clean subsets provide a filtered subset of chemical space that is less prone to artifactual hits, at a small cost of sacrificing chemistry.
[http://zinc.docking.org/scripts/get.db.from.id.sh].  


Now: If you cannot wait 6-10 weeks for your compounds to arrive, you should choose the Now subsets.  If you can afford to wait that long, you are better off not using Now, because you can access far more chemistry with the standard subsets.


Here is how to use this script.
= Can I script ZINC? =
Yes. You can write URLs that deliver formatted results.  See the [[ZINC:Command language]].  See also the [[Quick Search Bar]].


First, put the ZINC IDs you want to get into a file, say "list1"
= How are partial charges computed in ZINC? =
 
Second, put the list of lists into a file, call this "masterlist"
(i.e. line 1 of masterlist is contains 5 characters: list1
 
Invoke the program
 
chmod a+rx get.db.from.id.sh (as downloaded above)
./get.db.from.id.sh masterlist 0 mol2  ; # to get "reference" molecules (0) in mol2 format
./get.db.from.id.sh masterlist 1 mol2  ; # to get "additional physiological (mid, 1) in mol2
 
Good luck!
 
 
== How are partial charges computed in ZINC? ==


The partial charges are computed by AMSOL 6 using the SM5.42R solvation model in organic solvent. The AMSOL input parameters look like the following:
The partial charges are computed by AMSOL 6 using the SM5.42R solvation model in organic solvent. The AMSOL input parameters look like the following:
Line 77: Line 48:
(thanks to Michael Mysinger for this phrasing)
(thanks to Michael Mysinger for this phrasing)


= I am a vendor: How can I get my catalog added to ZINC? =
Please contact John Irwin who can help you do this.


-- John Irwin, Aug 2009
= Can I download everything? =
Yes, there is an everything subset. It is here: [http://zinc.docking.org/subsets/everything http://zinc.docking.org/subsets/everything].


= Can I use the old version? =
Old versions of ZINC still exist, but they are not recommended, and have been partially disabled. If you feel you need to use an old version, we would like to hear from you why.


[[Category:FAQ]]
[[Category:FAQ]]
[[Category:ZINC]]
[[Category:ZINC]]

Latest revision as of 15:42, 11 March 2014

Frequently asked questions about ZINC. Other FAQs

What is the version (how to cite) ?

The interface is version 12, released in January 2012. The next version will be in January 2013, and a beta will appear in the fall.

The subsets of ZINC are updated regularly. Each subset has a release date and the total number of molecules. We recommend you quote these to specify which version you used.

Thus you might say: We used ZINC version 12 (Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82). We downloaded the standard lead-like subset of 4,554,059 molecules dated 2012-02-06 on February 20, 2012.

We recommend that you always download the "Purchasing Info" in addition to the actual molecule structure files every time you download ZINC. This provides you with static purchasing information, in case in the meantime the molecules goes off the market and becomes unavailable.

What is the purpose of ZINC?

The purpose of ZINC is to simplify ligand discovery for biology. A primary aim is screening library for molecular docking, and also for chemical informatics. ZINC aims to be current with correct purchasing information, to enable research.

What can I do with ZINC?

You can use ZINC to make your life easier. You can use it for docking, for cheminformatics, for looking up molecules, for finding analogs by similarity, and for simply exploring vendors and the compounds they sell.

What sort of people is ZINC designed for?

ZINC is aimed at professional researchers seeking new molecules to test in biological experiments. It is used by graduate students, postdoctoral fellows and established researchers in pharmaceutical companies, in universities, in government research labs, and in biotechs and startup companies. ZINC is free to use for everyone.

How do I download ZINC? on Windows?

Every subset page and every vendor page has a special link for downloading the subset on Windows and on Unix, which includes Macs. We have a You Tube video which explains how to do this: http://youtube.com/user/chemistry4biology

Do I want usual or single or something else?

If your docking program calculates protonation states and tautomers on the fly, then you want single. If you are doing cheminformatics, you probably want your SMILES as single. If you are doing molecular docking, you probably want usual, which includes relevant molecular forms between pH 6 and 8. If you are docking to metalloenzymes you want the metal subsets, which includes additional high pH forms.

Do I want lead-like or fragment-like or drug-like or something else?

If the binding site is very small, or if you are using a biophysical assay such as NMR or SPR, then you may want fragment-like. In most other cases, lead-like is the best choice for discovery projects.

Do I want standard, clean or now subsets?

Clean: If you can tollerate a high false positive rate and you want more chemistry, you probably want a standard subset. If you are new to screening or unsure about what you are doing, a clean subset may be a better choice. Clean subsets provide a filtered subset of chemical space that is less prone to artifactual hits, at a small cost of sacrificing chemistry.

Now: If you cannot wait 6-10 weeks for your compounds to arrive, you should choose the Now subsets. If you can afford to wait that long, you are better off not using Now, because you can access far more chemistry with the standard subsets.

Can I script ZINC?

Yes. You can write URLs that deliver formatted results. See the ZINC:Command language. See also the Quick Search Bar.

How are partial charges computed in ZINC?

The partial charges are computed by AMSOL 6 using the SM5.42R solvation model in organic solvent. The AMSOL input parameters look like the following:

CHARGE=$netchg AM1 1SCF CART TLIMIT=1 GEO-OK SM5.42R & SOLVNT=GENORG IOFR=1.4345 ALPHA=0.00 BETA=0.00 GAMMA=38.93 & DIELEC=2.06 FACARB=0.00 FEHALO=0.00

We only calculate the charges on one ligand conformation, but we use them for the whole ligand conformational ensemble. (thanks to Michael Mysinger for this phrasing)

I am a vendor: How can I get my catalog added to ZINC?

Please contact John Irwin who can help you do this.

Can I download everything?

Yes, there is an everything subset. It is here: http://zinc.docking.org/subsets/everything.

Can I use the old version?

Old versions of ZINC still exist, but they are not recommended, and have been partially disabled. If you feel you need to use an old version, we would like to hear from you why.