ZINC processing pipeline: Difference between revisions

From DISI
Jump to navigation Jump to search
m (Adjust current ring puckering settings)
No edit summary
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
OBSOLETE.
Each molecule in ZINC is processed via our ZINC processing pipeline.  This process is embodied in a set of scripts that we continue to refine as we discover problems.
Each molecule in ZINC is processed via our ZINC processing pipeline.  This process is embodied in a set of scripts that we continue to refine as we discover problems.


Line 39: Line 41:


-- John Irwin. March 2009.
-- John Irwin. March 2009.
[[Category:jji]]

Latest revision as of 05:28, 14 February 2014

OBSOLETE.

Each molecule in ZINC is processed via our ZINC processing pipeline. This process is embodied in a set of scripts that we continue to refine as we discover problems.

Frankly, we hope people will simply use ZINC rather than trying to reproduce it. Still, in the interests of clarity, transparency, truth, justice and the Canadian Way (TM), here is our current protocol.

  • 1. If you have 2D SDF, convert it to isomeric SMILES.
  • 2. sed -e 's/N=S=N/nsn/g' 2.ism > 2-out.ism
  • 3. Use molinspiration mitools/mib to eliminate broken SMILES:
java -jar /raid1/soft/mitools/mib.jar -singlepart -onlyOrganic -normalizeCharges -f $1 -out smi
  • 4. Use OEChem to remove molecules with problematic functional groups:
filter.py rules.txt 4.ism 4-out.ism  > filterlog.txt

see http://blaster.docking.org/filtering/rules_default.txt for current rules.

  • 5. select only 4 of stereochemical expansions from previous step. We just take the first 4, but you can imagine better ways of making the selection.
  • 6. get rid of bogus stereochemistry at nitrogen:
sed -e 's/\[N@\]/N/g' -e 's/\[N@@\]/N/g' -e 's/\[N@H+\]/\[NH+\]/g' -e 's/\[N@@H+\]/\[NH+\]/g' -e 's/\[N@@+\]/\[N+\]/g' -e 's/\[N@+\]/\[N+\]/g' $1 >  d.ism
  • 7. If the molecule is already in ZINC, eliminate it from the list.
  • 8. Generate trial 3D structure with corina.
corina -d neu,wh,rc,mc=1,canon -i t=smiles -o t=sdf < 1a.ism > 2.sdf
  • 9. generate reference pH state using Schrodinger's Epik.
epik -ph 7.05 -ms 1 -imae A.mae -omae B.mae -WAIT
  • 10. generate mid, hi and lo pH subsets
mid: setenv EPIK "-ph 7.0 -pht 1 -tp 0.20"
hi: setenv EPIK "-ph 8.5 -pht 0.75 -tp 0.20"
lo:  setenv EPIK "-ph 5.5 -pht 0.75 -tp 0.20"
epik $EPIK -imae A.mae -omae B.mae -WAIT
  • 11. For each subset (ref, mid, hi, lo) use Corina to generate 3D model of the relevant protonated state.
corina -d rc,flapn,de=6,mc=4 -i t=mol2 -o t=mol2

That's really it. There is more to do with loading ZINC, but to generate the models, that is what we think you need to know. Good luck!

-- John Irwin. March 2009.