Loading ZINC12: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
(11 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting. | This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting. | ||
= YYZ protocol = | |||
1. | == 1. Acquire catalog, often as SDF == | ||
* This is often a manual step, by email, or download. | |||
* pc2unix to remove \r | |||
== 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table == | |||
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv | |||
code to edit field table | |||
code to load csv into synonyms | |||
split on ; or , - parsing. | |||
4. make ism | == 3. Deplete == | ||
previously was: deplete.pl ibsbb < ibsbb.ism | |||
now?? | |||
== 4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max == | |||
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log | |||
ignorenstereochem2.csh ibsbbexp.ism | |||
sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism | |||
== 5. protonation step with chemaxon! == | |||
cxcalc ... to generate a canonical form at pH 7.0 | |||
== 6. look up if exists in database (eventually load catalog_item, substance tables == | |||
python find_new_substances.py ibsbb ibsbb.ism catalog-item.csv | |||
== 7. generate protomers == | |||
cxcalc... dahlia procedure | |||
== 8. create db, db2, mol2, sdf, pdbqt, solv == | |||
ryan's procedure | |||
= Load ChEMBL for ZINC, SEA export = | |||
#get latest chembl | |||
#Load into psql | |||
#Export to files for SEA | |||
#Create links between substance, annotation via note | |||
#target clustering | |||
#Update SEA | |||
= UCSF protocol (OLD) = | |||
== Acquire databases == | |||
* 1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible | |||
* 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory | |||
pc2unix '*.sdf' | |||
* 3. make ism | |||
foreach i (*.sdf) | foreach i (*.sdf) | ||
namesdf.pl '<TAG>' < $i > $i.sdf | namesdf.pl '<TAG>' < $i > $i.sdf | ||
convert.py --i=$i.sdf --o=$i.ism | convert.py --i=$i.sdf --o=$i.ism | ||
* 4. Combine and move | |||
cat vendoris*.ism > all | |||
mv all ~xyz/raid8/catalog/vendorid.in | |||
ln -s !$ vendorid.ism | |||
* 5. There is no 5. | |||
oh yeah? | |||
6. | * 6. Mark depleted | ||
deplete.pl vendorid < vendorid.ism | deplete.pl vendorid < vendorid.ism | ||
* 7. Process on sgehead2 | |||
7. | mas.csh vendorid vendorid.ism ; # nb may run for a long time! | ||
mas.csh vendorid vendorid.ism ; # nb may run for a long time! | |||
nb periodically delete output | nb periodically delete output | ||
8. | * 8. Update filter info, error info | ||
9. export database on nfshead5 | * 9. export database on nfshead5 | ||
cd ~xyz/raid8/byvendor/.temp | cd ~xyz/raid8/byvendor/.temp | ||
mkdir vendorid | mkdir vendorid | ||
cd vendorid | cd vendorid | ||
Line 42: | Line 83: | ||
(may take a long time) | (may take a long time) | ||
10. export database on nfshead5 | * 10. export database on nfshead5 | ||
extractthis.csh vendorid nodb mol2 | extractthis.csh vendorid nodb mol2 | ||
NB db if annotated | NB db if annotated | ||
Line 48: | Line 89: | ||
11. if big, cluster on korn | * 11. if big, cluster on korn | ||
kornit.csh vendorid `pwd` | kornit.csh vendorid `pwd` | ||
12. finish off on wilco | 12. finish off on wilco | ||
./all.csh | |||
updateit.pl < log | |||
./all.csh | |||
cd vendorid | |||
dosubset4.pl vendorid vendor | |||
13. email vendor telling them their catalog has been updated in ZINC | 13. email vendor telling them their catalog has been updated in ZINC | ||
14. write tweet, etc. announcing, if appropriate. | 14. write tweet, etc. announcing, if appropriate. | ||
[[Category:ZINC12]] | |||
[[Category: | [[Category:Curator]] |
Latest revision as of 23:23, 4 January 2019
This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting.
YYZ protocol
1. Acquire catalog, often as SDF
- This is often a manual step, by email, or download.
- pc2unix to remove \r
2. Parse SDF into ISM, harvesting data from SD tags into synonyms table
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv code to edit field table code to load csv into synonyms split on ; or , - parsing.
3. Deplete
previously was: deplete.pl ibsbb < ibsbb.ism now??
4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log ignorenstereochem2.csh ibsbbexp.ism sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism
5. protonation step with chemaxon!
cxcalc ... to generate a canonical form at pH 7.0
6. look up if exists in database (eventually load catalog_item, substance tables
python find_new_substances.py ibsbb ibsbb.ism catalog-item.csv
7. generate protomers
cxcalc... dahlia procedure
8. create db, db2, mol2, sdf, pdbqt, solv
ryan's procedure
Load ChEMBL for ZINC, SEA export
- get latest chembl
- Load into psql
- Export to files for SEA
- Create links between substance, annotation via note
- target clustering
- Update SEA
UCSF protocol (OLD)
Acquire databases
- 1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible
- 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory
pc2unix '*.sdf'
- 3. make ism
foreach i (*.sdf) namesdf.pl '<TAG>' < $i > $i.sdf convert.py --i=$i.sdf --o=$i.ism
- 4. Combine and move
cat vendoris*.ism > all mv all ~xyz/raid8/catalog/vendorid.in ln -s !$ vendorid.ism
- 5. There is no 5.
oh yeah?
- 6. Mark depleted
deplete.pl vendorid < vendorid.ism
- 7. Process on sgehead2
mas.csh vendorid vendorid.ism ; # nb may run for a long time!
nb periodically delete output
- 8. Update filter info, error info
- 9. export database on nfshead5
cd ~xyz/raid8/byvendor/.temp mkdir vendorid cd vendorid callgr17.csh
(may take a long time)
- 10. export database on nfshead5
extractthis.csh vendorid nodb mol2
NB db if annotated mol2 in all cases
- 11. if big, cluster on korn
kornit.csh vendorid `pwd`
12. finish off on wilco
./all.csh updateit.pl < log ./all.csh cd vendorid dosubset4.pl vendorid vendor
13. email vendor telling them their catalog has been updated in ZINC
14. write tweet, etc. announcing, if appropriate.