Loading ZINC12: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
No edit summary
Line 5: Line 5:


== 1. Acquire catalog, often as SDF ==
== 1. Acquire catalog, often as SDF ==
* This is often a manual step, by email, or download.
* pc2unix to remove \r


== 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table ==
== 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table ==
  python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv
  python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv


== 3.  Desalt, default representation, filter out neverwanteds ==  
== 3. Deplete ==
* formerly: filter.py  
previously was:  deplete.pl ibsbb < ibsbb.ism
* now: ?
now??
 
== 4.  Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max ==  
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log
ignorenstereochem2.csh ibsbbexp.ism
sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism
 
== 5. protonation step with chemaxon! ==
cxcalc ... to generate a canonical form at pH 7.0


== 6. look up if exists in database (eventually load catalog_item, substance tables ==
  python find_new_substances.py ibsbb ibsbb.ism  catalog-item.csv
  python find_new_substances.py ibsbb ibsbb.ism  catalog-item.csv


== 4. Load substance, catalog_item ==
== 7. generate protomers ==
Nothing here yet
cxcalc... dahlia procedure
 
== 8. create db, db2, mol2, sdf, pdbqt, solv ==
ryan's procedure


== 5. Generate and load protomer ==
= Load ChEMBL for ZINC, SEA export =  
Nothing here yet.


= old UCSF protocol =  
== get latest chembl ==


= Acquire =  
== load into psql ==


1. Get the catalog from the vendor, usually in SDF.  NB. we need to automate this step as much as possible
== export to files for SEA ==


2. on nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory
== create links between substance, annotation via note ==


3. pc2unix '*.sdf'
== target clustering ==


4. make ism
== update SEA ==
 
 
 
= UCSF protocol (OLD) =
 
== Acquire databases ==
* 1. Get the catalog from the vendor, usually in SDF.  NB. we need to automate this step as much as possible
* 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory
pc2unix '*.sdf'
* 3. make ism
  foreach i (*.sdf)
  foreach i (*.sdf)
  namesdf.pl '<TAG>' < $i > $i.sdf
  namesdf.pl '<TAG>' < $i > $i.sdf
  convert.py --i=$i.sdf --o=$i.ism
  convert.py --i=$i.sdf --o=$i.ism
end
* 4. Combine and move


5. combine and move
cat vendoris*.ism > all
mv all ~xyz/raid8/catalog/vendorid.in
ln -s !$ vendorid.ism


cat vendoris*.ism > all
* 5. There is no 5.
mv all ~xyz/raid8/catalog/vendorid.in
oh yeah?
ln -s !$ vendorid.ism


6. mark depleted
* 6. Mark depleted
  deplete.pl vendorid < vendorid.ism  
  deplete.pl vendorid < vendorid.ism  


 
* 7. Process on sgehead2  
7. process on sgehead2  
mas.csh vendorid vendorid.ism  ; # nb may run for a long time!
mas.csh vendorid vendorid.ism  ; # nb may run for a long time!


nb periodically delete output
nb periodically delete output


8. update filter info, error info
* 8. Update filter info, error info


9. export database on nfshead5  
* 9. export database on nfshead5  


cd ~xyz/raid8/byvendor/.temp
cd ~xyz/raid8/byvendor/.temp
  mkdir vendorid
  mkdir vendorid
  cd vendorid
  cd vendorid
Line 63: Line 87:
(may take a long time)  
(may take a long time)  


10. export database on nfshead5  
* 10. export database on nfshead5  
  extractthis.csh vendorid nodb mol2   
  extractthis.csh vendorid nodb mol2   
NB db if annotated
NB db if annotated
Line 69: Line 93:




11. if big, cluster on korn
* 11. if big, cluster on korn
  kornit.csh vendorid `pwd`
  kornit.csh vendorid `pwd`


12. finish off  on wilco
12. finish off  on wilco


./all.csh  
  ./all.csh  
updateit.pl < log
  updateit.pl < log
./all.csh
  ./all.csh
cd vendorid
  cd vendorid
dosubset4.pl vendorid vendor
  dosubset4.pl vendorid vendor


13. email vendor telling them their catalog has been updated in ZINC
13. email vendor telling them their catalog has been updated in ZINC


14. write tweet, etc. announcing, if appropriate.
14. write tweet, etc. announcing, if appropriate.
Line 86: Line 111:


[[Category:Internal]]
[[Category:Internal]]
[[Category:Loading]]
[[Category:Sysadmin]]

Revision as of 16:34, 6 November 2013

This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting.

YYZ protocol

1. Acquire catalog, often as SDF

  • This is often a manual step, by email, or download.
  • pc2unix to remove \r


2. Parse SDF into ISM, harvesting data from SD tags into synonyms table

python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv

3. Deplete

previously was:  deplete.pl ibsbb < ibsbb.ism
now??

4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max

filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log
ignorenstereochem2.csh ibsbbexp.ism
sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism 

5. protonation step with chemaxon!

cxcalc ... to generate a canonical form at pH 7.0

6. look up if exists in database (eventually load catalog_item, substance tables

python find_new_substances.py ibsbb ibsbb.ism  catalog-item.csv

7. generate protomers

cxcalc... dahlia procedure

8. create db, db2, mol2, sdf, pdbqt, solv

ryan's procedure 

Load ChEMBL for ZINC, SEA export

get latest chembl

load into psql

export to files for SEA

create links between substance, annotation via note

target clustering

update SEA

UCSF protocol (OLD)

Acquire databases

  • 1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible
  • 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory
pc2unix '*.sdf' 
  • 3. make ism
foreach i (*.sdf)
namesdf.pl '<TAG>' < $i > $i.sdf
convert.py --i=$i.sdf --o=$i.ism
  • 4. Combine and move
cat vendoris*.ism > all 
mv all ~xyz/raid8/catalog/vendorid.in
ln -s !$ vendorid.ism
  • 5. There is no 5.
oh yeah?
  • 6. Mark depleted
deplete.pl vendorid < vendorid.ism 
  • 7. Process on sgehead2
mas.csh vendorid vendorid.ism   ; # nb may run for a long time!

nb periodically delete output

  • 8. Update filter info, error info
  • 9. export database on nfshead5
cd ~xyz/raid8/byvendor/.temp
mkdir vendorid
cd vendorid
callgr17.csh

(may take a long time)

  • 10. export database on nfshead5
extractthis.csh vendorid nodb mol2   

NB db if annotated mol2 in all cases


  • 11. if big, cluster on korn
kornit.csh vendorid `pwd`

12. finish off on wilco

 ./all.csh 
 updateit.pl < log
 ./all.csh
 cd vendorid
 dosubset4.pl vendorid vendor

13. email vendor telling them their catalog has been updated in ZINC


14. write tweet, etc. announcing, if appropriate.