ZINC:FAQ
Here are frequently asked questions about ZINC.
Q1. I am trying to generate a subset of your "drug-like" molecule subset for
virtual screening. I was thinking your 60% diversity group (about 12,000
molecules) would be a place to start, and I downloaded the .smi file. I
relatively new to chemoinformatics and I was wondering if there is an
elegant way to separate the compounds listed in the .smi file from the
larger library containing the mol2 files from the 2,000,000 "usual" set that
I have downloaded from ZINC?
A1.
wget http://zinc8.docking.org/subset1/3/3_t60.smi awk '{print $2}' 3_t60.smi >! codes sed -e 's/^/fget2.pl?f=m\&l=0\&z=/' codes > codes2 wget -O all.mol2 -a listing -B http://zinc8.docking.org/ -i codes2
l indicates the pH model. 0=reference (pH 7), 1=mid (5.75-8.25), 2=hi (7-8.5), 3=lo (4.5-6)
Q2. I want a hierarchy format database based on ZINC IDs.
A2.
create file "hits.txt" containing one ZINC ID per row. sed -e 's/^/fget2.pl?f=h\&l=0\&z=/' hits.txt > ref.txt wget -O ref.db -a listing -B http://zinc.docking.org/ -i ref.txt
The previous line gets the "reference" (pH 7) models. For additional "usual" forms, use l=1.
sed -e 's/^/fget2.pl?f=h\&l=1\&z=/' hits.txt > mid.txt wget -O mid.db -a listing -B http://zinc.docking.org/ -i mid.txt
Note we recommend splitting hits.txt into sets of 1000 ZINC IDs each, thus:
split -l hits.txt foreach i (x??) sed ... wget ... end
Please let us know if this is not clear!
Q3. Is there a script that does all of this? A3. Yes, Peter Kolb wrote one (thanks Peter). Download it here....
-- John Irwin, March 2009