ZINC processing pipeline

From DISI
Revision as of 17:26, 10 March 2009 by Frodo (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Each molecule in ZINC is processed via our ZINC processing pipeline. This process is embodied in a set of scripts that we continue to refine as we discover problems.

Frankly, we hope people will simply use ZINC rather than trying to reproduce it. Still, in the interests of clarity, transparency, truth, justice and the Canadian Way (TM), here is our current protocol.

1. If you have 2D SDF, convert it to isomeric SMILES. 2. sed -e 's/N=S=N/nsn/g' 2.ism > 2-out.ism 3. Use molinspiration mitools/mib to eliminate broken SMILES: /java -jar /raid1/soft/mitools/mib.jar -singlepart -onlyOrganic -normalizeCharges -f $1 -out smi 4. Use OEChem to remove molecules with problematic functional groups: filter.py rules.txt 4.ism 4-out.ism > filterlog.txt see http://blaster.docking.org/filtering/rules_default.txt for current rules. 5. select only 4 of stereochemical expansions from previous step. We just take the first 4, but you can imagine better ways of making the selection. 6. get rid of bogus stereochemistry at nitrogen: sed -e 's/\[N@\]/N/g' -e 's/\[N@@\]/N/g' -e 's/\[N@H+\]/\[NH+\]/g' -e 's/\[N@@H+ \]/\[NH+\]/g' -e 's/\[N@@+\]/\[N+\]/g' -e 's/\[N@+\]/\[N+\]/g' $1 > d.ism 7. If the molecule is already in ZINC, eliminate it from the list. 8. Generate trial 3D structure with corina. corina -d neu,wh,rc,mc=1,canon -i t=smiles -o t=sdf < 1a.ism > 2.sdf 9. generate reference pH state using Schrodinger's Epik. epik -ph 7.05 -ms 1 -imae A.mae -omae B.mae -WAIT 10. generate mid, hi and lo pH subsets mid: setenv EPIK "-ph 7.0 -pht 1 -tp 0.20" hi: setenv EPIK "-ph 8.5 -pht 0.75 -tp 0.20" lo: setenv EPIK "-ph 5.5 -pht 0.75 -tp 0.20" epik $EPIK -imae A.mae -omae B.mae -WAIT

For each subset (ref, mid, hi, lo) process as follows: a. Corina to generate 3D model of the relevant protonated state. corina -d rc,flapn,de=5,mc=2 -i t=mol2 -o t=mol2

That's really it. There is more to do with loading ZINC, but to generate the models, that is what we think you need to know. Good luck!

-- John Irwin. March 2009.