ZINC:FAQ: Difference between revisions

From DISI
Jump to navigation Jump to search
mNo edit summary
No edit summary
 
(6 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Here are frequently asked questions about ZINC.
Frequently asked questions about ZINC. [[:Category:FAQ |Other FAQs]]
{{TOCright}}


* [[DOCK:FAQ]]
= What is the version (how to cite) ? =
* [[DUD:FAQ]]
The interface is version 12, released in January 2012.  The next version will be in January 2013, and a beta will appear in the fall.  
* [[DOCK Blaster:FAQ]]
* [[THC:FAQ]]
* [[FAQ]] for everything not covered by one of those products.


Q1. I am trying to generate a subset of your "drug-like" molecule subset for
The subsets of ZINC are updated regularlyEach subset has a release date and the total number of moleculesWe recommend you quote these to specify which version you used.  
virtual screeningI was thinking your 60% diversity group (about 12,000
molecules) would be a place to start, and I downloaded the .smi fileI
relatively new to chemoinformatics and I was wondering if there is an
elegant way to separate the compounds listed in the .smi file from the
larger library containing the mol2 files from the 2,000,000 "usual" set that
I have downloaded from ZINC?


A1.
Thus you might say: We used ZINC version 12 (Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82)We downloaded the standard lead-like subset of 4,554,059 molecules dated 2012-02-06 on February 20, 2012.  
  wget http://zinc8.docking.org/subset1/3/3_t60.smi
awk '{print $2}' 3_t60.smi >! codes
sed -e 's/^/fget2.pl?f=m\&l=0\&z=/' codes  > codes2
  wget -O all.mol2 -a listing  -B http://zinc8.docking.org/ -i codes2


We recommend that you always download the "Purchasing Info" in addition to the actual molecule structure files every time you download ZINC.  This provides you with static purchasing information, in case in the meantime the molecules goes off the market and becomes unavailable.


l indicates the pH model. 0=reference (pH 7), 1=mid (5.75-8.25), 2=hi (7-8.5), 3=lo (4.5-6)
= What is the purpose of ZINC? =  
The purpose of ZINC is to simplify ligand discovery for biology. A primary aim is screening library for molecular docking, and also for chemical informatics. ZINC aims to be current with correct purchasing information, to enable research.  


= What can I do with ZINC? =
You can use ZINC to make your life easier. You can use it for docking, for cheminformatics, for looking up molecules, for finding analogs by similarity, and for simply exploring vendors and the compounds they sell.


Q2. I want a hierarchy format database based on ZINC IDs.
= What sort of people is ZINC designed for? =
ZINC is aimed at professional researchers seeking new molecules to test in biological experiments. It is used by graduate students, postdoctoral fellows and established researchers in pharmaceutical companies, in universities, in government research labs, and in biotechs and startup companies. ZINC is free to use for everyone.  


A2.
= How do I download ZINC? on Windows? =  
  create file "hits.txt" containing one ZINC ID per row.
Every subset page and every vendor page has a special link for downloading the subset on Windows and on Unix, which includes Macs. We have a You Tube video which explains how to do this:  http://youtube.com/user/chemistry4biology
  sed -e 's/^/fget2.pl?f=h\&l=0\&z=/' hits.txt > ref.txt
  wget -O ref.db -a listing -B http://zinc.docking.org/ -i ref.txt
The previous line gets the "reference" (pH 7) models. For additional "usual" forms, use l=1.
  sed -e 's/^/fget2.pl?f=h\&l=1\&z=/' hits.txt > mid.txt
  wget -O mid.db -a listing -B http://zinc.docking.org/ -i mid.txt
Note we recommend splitting hits.txt into sets of 1000 ZINC IDs each, thus:
  split -l hits.txt
  foreach i (x??)
    sed ...
    wget ...
  end
Please let us know if this is not clear!


= Do I want usual or single or something else? =
If your docking program calculates protonation states and tautomers on the fly, then you want single. If you are doing cheminformatics, you probably want your SMILES as single. If you are doing molecular docking, you probably want usual, which includes relevant molecular forms between pH 6 and 8.  If you are docking to metalloenzymes you want the metal subsets, which includes additional high pH forms.


Q3. Is there a script that does all of this?
= Do I want lead-like or fragment-like or drug-like or something else? =
If the binding site is very small, or if you are using a biophysical assay such as NMR or SPR, then you may want fragment-like.  In most other cases, lead-like is the best choice for discovery projects.


A3.  Yes, Peter Kolb wrote one (thanks Peter).  Download it here....
= Do I want standard, clean or now subsets? =
[http://zinc.docking.org/scripts/get.db.from.id.sh].  
Clean: If you can tollerate a high false positive rate and you want more chemistry, you probably want a standard subset. If you are new to screening or unsure about what you are doing, a clean subset may be a better choice. Clean subsets provide a filtered subset of chemical space that is less prone to artifactual hits, at a small cost of sacrificing chemistry.


Now: If you cannot wait 6-10 weeks for your compounds to arrive, you should choose the Now subsets.  If you can afford to wait that long, you are better off not using Now, because you can access far more chemistry with the standard subsets.


Here is how to use this script.
= Can I script ZINC? =
Yes. You can write URLs that deliver formatted results.  See the [[ZINC:Command language]].  See also the [[Quick Search Bar]].


First, put the ZINC IDs you want to get into a file, say "list1"
= How are partial charges computed in ZINC? =


Second, put the list of lists into a file, call this "masterlist"
The partial charges are computed by AMSOL 6 using the SM5.42R solvation model in organic solvent. The AMSOL input parameters look like the following:
(i.e. line 1 of masterlist is contains 5 characters: list1


Invoke the program
CHARGE=$netchg AM1 1SCF CART TLIMIT=1 GEO-OK SM5.42R
& SOLVNT=GENORG IOFR=1.4345 ALPHA=0.00 BETA=0.00 GAMMA=38.93
& DIELEC=2.06 FACARB=0.00 FEHALO=0.00


chmod a+rx get.db.from.id.sh (as downloaded above)
We only calculate the charges on one ligand conformation, but we use them for the whole ligand conformational ensemble.
./get.db.from.id.sh masterlist 0 mol2  ; # to get "reference" molecules (0) in mol2 format
(thanks to Michael Mysinger for this phrasing)
./get.db.from.id.sh masterlist 1 mol2  ; # to get "additional physiological (mid, 1) in mol2


Good luck!
= I am a vendor: How can I get my catalog added to ZINC? =
Please contact John Irwin who can help you do this.


= Can I download everything? =
Yes, there is an everything subset. It is here: [http://zinc.docking.org/subsets/everything http://zinc.docking.org/subsets/everything].


-- John Irwin, March 2009
= Can I use the old version? =
 
Old versions of ZINC still exist, but they are not recommended, and have been partially disabled. If you feel you need to use an old version, we would like to hear from you why.


[[Category:FAQ]]
[[Category:FAQ]]
[[Category:ZINC]]
[[Category:ZINC]]

Latest revision as of 15:42, 11 March 2014

Frequently asked questions about ZINC. Other FAQs

What is the version (how to cite) ?

The interface is version 12, released in January 2012. The next version will be in January 2013, and a beta will appear in the fall.

The subsets of ZINC are updated regularly. Each subset has a release date and the total number of molecules. We recommend you quote these to specify which version you used.

Thus you might say: We used ZINC version 12 (Irwin and Shoichet, J. Chem. Inf. Model. 2005;45(1):177-82). We downloaded the standard lead-like subset of 4,554,059 molecules dated 2012-02-06 on February 20, 2012.

We recommend that you always download the "Purchasing Info" in addition to the actual molecule structure files every time you download ZINC. This provides you with static purchasing information, in case in the meantime the molecules goes off the market and becomes unavailable.

What is the purpose of ZINC?

The purpose of ZINC is to simplify ligand discovery for biology. A primary aim is screening library for molecular docking, and also for chemical informatics. ZINC aims to be current with correct purchasing information, to enable research.

What can I do with ZINC?

You can use ZINC to make your life easier. You can use it for docking, for cheminformatics, for looking up molecules, for finding analogs by similarity, and for simply exploring vendors and the compounds they sell.

What sort of people is ZINC designed for?

ZINC is aimed at professional researchers seeking new molecules to test in biological experiments. It is used by graduate students, postdoctoral fellows and established researchers in pharmaceutical companies, in universities, in government research labs, and in biotechs and startup companies. ZINC is free to use for everyone.

How do I download ZINC? on Windows?

Every subset page and every vendor page has a special link for downloading the subset on Windows and on Unix, which includes Macs. We have a You Tube video which explains how to do this: http://youtube.com/user/chemistry4biology

Do I want usual or single or something else?

If your docking program calculates protonation states and tautomers on the fly, then you want single. If you are doing cheminformatics, you probably want your SMILES as single. If you are doing molecular docking, you probably want usual, which includes relevant molecular forms between pH 6 and 8. If you are docking to metalloenzymes you want the metal subsets, which includes additional high pH forms.

Do I want lead-like or fragment-like or drug-like or something else?

If the binding site is very small, or if you are using a biophysical assay such as NMR or SPR, then you may want fragment-like. In most other cases, lead-like is the best choice for discovery projects.

Do I want standard, clean or now subsets?

Clean: If you can tollerate a high false positive rate and you want more chemistry, you probably want a standard subset. If you are new to screening or unsure about what you are doing, a clean subset may be a better choice. Clean subsets provide a filtered subset of chemical space that is less prone to artifactual hits, at a small cost of sacrificing chemistry.

Now: If you cannot wait 6-10 weeks for your compounds to arrive, you should choose the Now subsets. If you can afford to wait that long, you are better off not using Now, because you can access far more chemistry with the standard subsets.

Can I script ZINC?

Yes. You can write URLs that deliver formatted results. See the ZINC:Command language. See also the Quick Search Bar.

How are partial charges computed in ZINC?

The partial charges are computed by AMSOL 6 using the SM5.42R solvation model in organic solvent. The AMSOL input parameters look like the following:

CHARGE=$netchg AM1 1SCF CART TLIMIT=1 GEO-OK SM5.42R & SOLVNT=GENORG IOFR=1.4345 ALPHA=0.00 BETA=0.00 GAMMA=38.93 & DIELEC=2.06 FACARB=0.00 FEHALO=0.00

We only calculate the charges on one ligand conformation, but we use them for the whole ligand conformational ensemble. (thanks to Michael Mysinger for this phrasing)

I am a vendor: How can I get my catalog added to ZINC?

Please contact John Irwin who can help you do this.

Can I download everything?

Yes, there is an everything subset. It is here: http://zinc.docking.org/subsets/everything.

Can I use the old version?

Old versions of ZINC still exist, but they are not recommended, and have been partially disabled. If you feel you need to use an old version, we would like to hear from you why.