Loading ZINC12: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
(8 intermediate revisions by 2 users not shown) | |||
Line 11: | Line 11: | ||
== 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table == | == 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table == | ||
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv | python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv | ||
code to edit field table | |||
code to load csv into synonyms | |||
split on ; or , - parsing. | |||
== 3. Deplete == | == 3. Deplete == | ||
Line 19: | Line 22: | ||
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log | filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log | ||
ignorenstereochem2.csh ibsbbexp.ism | ignorenstereochem2.csh ibsbbexp.ism | ||
sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism | sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism | ||
== 5. protonation step with chemaxon! == | == 5. protonation step with chemaxon! == | ||
Line 35: | Line 38: | ||
= Load ChEMBL for ZINC, SEA export = | = Load ChEMBL for ZINC, SEA export = | ||
#get latest chembl | |||
#Load into psql | |||
#Export to files for SEA | |||
#Create links between substance, annotation via note | |||
#target clustering | |||
#Update SEA | |||
= UCSF protocol (OLD) = | = UCSF protocol (OLD) = | ||
Line 109: | Line 105: | ||
14. write tweet, etc. announcing, if appropriate. | 14. write tweet, etc. announcing, if appropriate. | ||
[[Category:ZINC12]] | |||
[[Category:Curator]] | |||
[[Category: | |||
[[Category: |
Latest revision as of 23:23, 4 January 2019
This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting.
YYZ protocol
1. Acquire catalog, often as SDF
- This is often a manual step, by email, or download.
- pc2unix to remove \r
2. Parse SDF into ISM, harvesting data from SD tags into synonyms table
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv code to edit field table code to load csv into synonyms split on ; or , - parsing.
3. Deplete
previously was: deplete.pl ibsbb < ibsbb.ism now??
4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log ignorenstereochem2.csh ibsbbexp.ism sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism
5. protonation step with chemaxon!
cxcalc ... to generate a canonical form at pH 7.0
6. look up if exists in database (eventually load catalog_item, substance tables
python find_new_substances.py ibsbb ibsbb.ism catalog-item.csv
7. generate protomers
cxcalc... dahlia procedure
8. create db, db2, mol2, sdf, pdbqt, solv
ryan's procedure
Load ChEMBL for ZINC, SEA export
- get latest chembl
- Load into psql
- Export to files for SEA
- Create links between substance, annotation via note
- target clustering
- Update SEA
UCSF protocol (OLD)
Acquire databases
- 1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible
- 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory
pc2unix '*.sdf'
- 3. make ism
foreach i (*.sdf) namesdf.pl '<TAG>' < $i > $i.sdf convert.py --i=$i.sdf --o=$i.ism
- 4. Combine and move
cat vendoris*.ism > all mv all ~xyz/raid8/catalog/vendorid.in ln -s !$ vendorid.ism
- 5. There is no 5.
oh yeah?
- 6. Mark depleted
deplete.pl vendorid < vendorid.ism
- 7. Process on sgehead2
mas.csh vendorid vendorid.ism ; # nb may run for a long time!
nb periodically delete output
- 8. Update filter info, error info
- 9. export database on nfshead5
cd ~xyz/raid8/byvendor/.temp mkdir vendorid cd vendorid callgr17.csh
(may take a long time)
- 10. export database on nfshead5
extractthis.csh vendorid nodb mol2
NB db if annotated mol2 in all cases
- 11. if big, cluster on korn
kornit.csh vendorid `pwd`
12. finish off on wilco
./all.csh updateit.pl < log ./all.csh cd vendorid dosubset4.pl vendorid vendor
13. email vendor telling them their catalog has been updated in ZINC
14. write tweet, etc. announcing, if appropriate.