Latest revision as of 23:23, 4 January 2019

This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting.

YYZ protocol

1. Acquire catalog, often as SDF

This is often a manual step, by email, or download.
pc2unix to remove \r

2. Parse SDF into ISM, harvesting data from SD tags into synonyms table

python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv
code to edit field table
code to load csv into synonyms
split on ; or ,  - parsing.

3. Deplete

previously was:  deplete.pl ibsbb < ibsbb.ism
now??

4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max

filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log
ignorenstereochem2.csh ibsbbexp.ism
sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism

5. protonation step with chemaxon!

cxcalc ... to generate a canonical form at pH 7.0

6. look up if exists in database (eventually load catalog_item, substance tables

python find_new_substances.py ibsbb ibsbb.ism  catalog-item.csv

7. generate protomers

cxcalc... dahlia procedure

8. create db, db2, mol2, sdf, pdbqt, solv

ryan's procedure

Load ChEMBL for ZINC, SEA export

get latest chembl
Load into psql
Export to files for SEA
Create links between substance, annotation via note
target clustering
Update SEA

UCSF protocol (OLD)

Acquire databases

1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible
2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory

pc2unix '*.sdf'

3. make ism

foreach i (*.sdf)
namesdf.pl '<TAG>' < $i > $i.sdf
convert.py --i=$i.sdf --o=$i.ism

4. Combine and move

cat vendoris*.ism > all 
mv all ~xyz/raid8/catalog/vendorid.in
ln -s !$ vendorid.ism

5. There is no 5.

oh yeah?

6. Mark depleted

deplete.pl vendorid < vendorid.ism

7. Process on sgehead2

mas.csh vendorid vendorid.ism   ; # nb may run for a long time!

nb periodically delete output

8. Update filter info, error info

9. export database on nfshead5

cd ~xyz/raid8/byvendor/.temp
mkdir vendorid
cd vendorid
callgr17.csh

(may take a long time)

10. export database on nfshead5

extractthis.csh vendorid nodb mol2

NB db if annotated mol2 in all cases

11. if big, cluster on korn

kornit.csh vendorid `pwd`

12. finish off on wilco

 ./all.csh 
 updateit.pl < log
 ./all.csh
 cd vendorid
 dosubset4.pl vendorid vendor

13. email vendor telling them their catalog has been updated in ZINC

14. write tweet, etc. announcing, if appropriate.

@@ Line 13: / Line 13: @@
   code to edit field table
   code to load csv into synonyms
+ split on ; or ,  - parsing.
 == 3. Deplete ==
@@ Line 21: / Line 22: @@
   filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log
   ignorenstereochem2.csh ibsbbexp.ism
   sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism
 == 5. protonation step with chemaxon! ==
@@ Line 37: / Line 38: @@
 = Load ChEMBL for ZINC, SEA export =
-== get latest chembl ==
+#get latest chembl
+#Load into psql
-== load into psql ==
+#Export to files for SEA
+#Create links between substance, annotation via note
-== export to files for SEA ==
+#target clustering
+#Update SEA
-== create links between substance, annotation via note ==
-== target clustering ==
-== update SEA ==
 = UCSF protocol (OLD) =
@@ Line 111: / Line 105: @@
 . write tweet, etc. announcing, if appropriate.
+[[Category:ZINC12]]
-[[Category:Internal]]
+[[Category:Curator]]
-[[Category:Loading]]
-[[Category:Sysadmin]]

Loading ZINC12: Difference between revisions

Latest revision as of 23:23, 4 January 2019

Contents

YYZ protocol

1. Acquire catalog, often as SDF

2. Parse SDF into ISM, harvesting data from SD tags into synonyms table

3. Deplete

4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max

5. protonation step with chemaxon!

6. look up if exists in database (eventually load catalog_item, substance tables

7. generate protomers

8. create db, db2, mol2, sdf, pdbqt, solv

Load ChEMBL for ZINC, SEA export

UCSF protocol (OLD)

Acquire databases

Navigation menu

Loading ZINC12: Difference between revisions

Latest revision as of 23:23, 4 January 2019

YYZ protocol

1. Acquire catalog, often as SDF

2. Parse SDF into ISM, harvesting data from SD tags into synonyms table

3. Deplete

4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max

5. protonation step with chemaxon!

6. look up if exists in database (eventually load catalog_item, substance tables

7. generate protomers

8. create db, db2, mol2, sdf, pdbqt, solv

Load ChEMBL for ZINC, SEA export

UCSF protocol (OLD)

Acquire databases

Navigation menu

Search