ZINC:FAQ: Difference between revisions
mNo edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Here are frequently asked questions about ZINC. | Here are frequently asked questions about ZINC. | ||
{{TOCright}} | |||
Note that other FAQs are also available. | |||
* [[DOCK:FAQ]] | * [[DOCK:FAQ]] | ||
* [[DUD:FAQ]] | * [[DUD:FAQ]] | ||
Line 6: | Line 8: | ||
* [[THC:FAQ]] | * [[THC:FAQ]] | ||
* [[FAQ]] for everything not covered by one of those products. | * [[FAQ]] for everything not covered by one of those products. | ||
== How do I get arbitrary subsets of ZINC? == | == How do I get arbitrary subsets of ZINC? == |
Revision as of 23:46, 28 June 2011
Here are frequently asked questions about ZINC.
Note that other FAQs are also available.
- DOCK:FAQ
- DUD:FAQ
- DOCK Blaster:FAQ
- THC:FAQ
- FAQ for everything not covered by one of those products.
How do I get arbitrary subsets of ZINC?
Q1. I am trying to generate a subset of your "drug-like" molecule subset for virtual screening. I was thinking your 60% diversity group (about 12,000 molecules) would be a place to start, and I downloaded the .smi file. I relatively new to chemoinformatics and I was wondering if there is an elegant way to separate the compounds listed in the .smi file from the larger library containing the mol2 files from the 2,000,000 "usual" set that I have downloaded from ZINC?
A1.
wget http://zinc8.docking.org/subset1/3/3_t60.smi awk '{print $2}' 3_t60.smi >! codes sed -e 's/^/fget2.pl?f=m\&l=0\&z=/' codes > codes2 wget -O all.mol2 -a listing -B http://zinc8.docking.org/ -i codes2
l indicates the pH model. 0=reference (pH 7), 1=mid (5.75-8.25), 2=hi (7-8.5), 3=lo (4.5-6)
I want a hierarchy format database based on ZINC IDs.
A2.
create file "hits.txt" containing one ZINC ID per row. sed -e 's/^/fget2.pl?f=h\&l=0\&z=/' hits.txt > ref.txt wget -O ref.db -a listing -B http://zinc.docking.org/ -i ref.txt
The previous line gets the "reference" (pH 7) models. For additional "usual" forms, use l=1.
sed -e 's/^/fget2.pl?f=h\&l=1\&z=/' hits.txt > mid.txt wget -O mid.db -a listing -B http://zinc.docking.org/ -i mid.txt
Note we recommend splitting hits.txt into sets of 1000 ZINC IDs each, thus:
split -l hits.txt foreach i (x??) sed ... wget ... end
Please let us know if this is not clear!
Is there a script that does all of this?
A3. Yes, Peter Kolb wrote one (thanks Peter). Download it here.... [1].
Here is how to use this script.
First, put the ZINC IDs you want to get into a file, say "list1"
Second, put the list of lists into a file, call this "masterlist" (i.e. line 1 of masterlist is contains 5 characters: list1
Invoke the program
chmod a+rx get.db.from.id.sh (as downloaded above) ./get.db.from.id.sh masterlist 0 mol2 ; # to get "reference" molecules (0) in mol2 format ./get.db.from.id.sh masterlist 1 mol2 ; # to get "additional physiological (mid, 1) in mol2
Good luck!
How are partial charges computed in ZINC?
The partial charges are computed by AMSOL 6 using the SM5.42R solvation model in organic solvent. The AMSOL input parameters look like the following:
CHARGE=$netchg AM1 1SCF CART TLIMIT=1 GEO-OK SM5.42R & SOLVNT=GENORG IOFR=1.4345 ALPHA=0.00 BETA=0.00 GAMMA=38.93 & DIELEC=2.06 FACARB=0.00 FEHALO=0.00
We only calculate the charges on one ligand conformation, but we use them for the whole ligand conformational ensemble. (thanks to Michael Mysinger for this phrasing)
-- John Irwin, Aug 2009