ZINC processing pipeline

From DISI
(Redirected from ZPP)
Jump to navigation Jump to search

OBSOLETE.

Each molecule in ZINC is processed via our ZINC processing pipeline. This process is embodied in a set of scripts that we continue to refine as we discover problems.

Frankly, we hope people will simply use ZINC rather than trying to reproduce it. Still, in the interests of clarity, transparency, truth, justice and the Canadian Way (TM), here is our current protocol.

  • 1. If you have 2D SDF, convert it to isomeric SMILES.
  • 2. sed -e 's/N=S=N/nsn/g' 2.ism > 2-out.ism
  • 3. Use molinspiration mitools/mib to eliminate broken SMILES:
java -jar /raid1/soft/mitools/mib.jar -singlepart -onlyOrganic -normalizeCharges -f $1 -out smi
  • 4. Use OEChem to remove molecules with problematic functional groups:
filter.py rules.txt 4.ism 4-out.ism  > filterlog.txt

see http://blaster.docking.org/filtering/rules_default.txt for current rules.

  • 5. select only 4 of stereochemical expansions from previous step. We just take the first 4, but you can imagine better ways of making the selection.
  • 6. get rid of bogus stereochemistry at nitrogen:
sed -e 's/\[N@\]/N/g' -e 's/\[N@@\]/N/g' -e 's/\[N@H+\]/\[NH+\]/g' -e 's/\[N@@H+\]/\[NH+\]/g' -e 's/\[N@@+\]/\[N+\]/g' -e 's/\[N@+\]/\[N+\]/g' $1 >  d.ism
  • 7. If the molecule is already in ZINC, eliminate it from the list.
  • 8. Generate trial 3D structure with corina.
corina -d neu,wh,rc,mc=1,canon -i t=smiles -o t=sdf < 1a.ism > 2.sdf
  • 9. generate reference pH state using Schrodinger's Epik.
epik -ph 7.05 -ms 1 -imae A.mae -omae B.mae -WAIT
  • 10. generate mid, hi and lo pH subsets
mid: setenv EPIK "-ph 7.0 -pht 1 -tp 0.20"
hi: setenv EPIK "-ph 8.5 -pht 0.75 -tp 0.20"
lo:  setenv EPIK "-ph 5.5 -pht 0.75 -tp 0.20"
epik $EPIK -imae A.mae -omae B.mae -WAIT
  • 11. For each subset (ref, mid, hi, lo) use Corina to generate 3D model of the relevant protonated state.
corina -d rc,flapn,de=6,mc=4 -i t=mol2 -o t=mol2

That's really it. There is more to do with loading ZINC, but to generate the models, that is what we think you need to know. Good luck!

-- John Irwin. March 2009.