Loading ZINC12: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
No edit summary |
||
Line 5: | Line 5: | ||
== 1. Acquire catalog, often as SDF == | == 1. Acquire catalog, often as SDF == | ||
* This is often a manual step, by email, or download. | |||
* pc2unix to remove \r | |||
== 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table == | == 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table == | ||
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv | python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv | ||
== 3. Desalt | == 3. Deplete == | ||
previously was: deplete.pl ibsbb < ibsbb.ism | |||
now?? | |||
== 4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max == | |||
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log | |||
ignorenstereochem2.csh ibsbbexp.ism | |||
sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism | |||
== 5. protonation step with chemaxon! == | |||
cxcalc ... to generate a canonical form at pH 7.0 | |||
== 6. look up if exists in database (eventually load catalog_item, substance tables == | |||
python find_new_substances.py ibsbb ibsbb.ism catalog-item.csv | python find_new_substances.py ibsbb ibsbb.ism catalog-item.csv | ||
== | == 7. generate protomers == | ||
cxcalc... dahlia procedure | |||
== 8. create db, db2, mol2, sdf, pdbqt, solv == | |||
ryan's procedure | |||
== | = Load ChEMBL for ZINC, SEA export = | ||
= | == get latest chembl == | ||
= | == load into psql == | ||
== export to files for SEA == | |||
== create links between substance, annotation via note == | |||
== target clustering == | |||
== update SEA == | |||
= UCSF protocol (OLD) = | |||
== Acquire databases == | |||
* 1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible | |||
* 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory | |||
pc2unix '*.sdf' | |||
* 3. make ism | |||
foreach i (*.sdf) | foreach i (*.sdf) | ||
namesdf.pl '<TAG>' < $i > $i.sdf | namesdf.pl '<TAG>' < $i > $i.sdf | ||
convert.py --i=$i.sdf --o=$i.ism | convert.py --i=$i.sdf --o=$i.ism | ||
* 4. Combine and move | |||
cat vendoris*.ism > all | |||
mv all ~xyz/raid8/catalog/vendorid.in | |||
ln -s !$ vendorid.ism | |||
* 5. There is no 5. | |||
oh yeah? | |||
6. | * 6. Mark depleted | ||
deplete.pl vendorid < vendorid.ism | deplete.pl vendorid < vendorid.ism | ||
* 7. Process on sgehead2 | |||
7. | mas.csh vendorid vendorid.ism ; # nb may run for a long time! | ||
mas.csh vendorid vendorid.ism ; # nb may run for a long time! | |||
nb periodically delete output | nb periodically delete output | ||
8. | * 8. Update filter info, error info | ||
9. export database on nfshead5 | * 9. export database on nfshead5 | ||
cd ~xyz/raid8/byvendor/.temp | cd ~xyz/raid8/byvendor/.temp | ||
mkdir vendorid | mkdir vendorid | ||
cd vendorid | cd vendorid | ||
Line 63: | Line 87: | ||
(may take a long time) | (may take a long time) | ||
10. export database on nfshead5 | * 10. export database on nfshead5 | ||
extractthis.csh vendorid nodb mol2 | extractthis.csh vendorid nodb mol2 | ||
NB db if annotated | NB db if annotated | ||
Line 69: | Line 93: | ||
11. if big, cluster on korn | * 11. if big, cluster on korn | ||
kornit.csh vendorid `pwd` | kornit.csh vendorid `pwd` | ||
12. finish off on wilco | 12. finish off on wilco | ||
./all.csh | |||
updateit.pl < log | |||
./all.csh | |||
cd vendorid | |||
dosubset4.pl vendorid vendor | |||
13. email vendor telling them their catalog has been updated in ZINC | 13. email vendor telling them their catalog has been updated in ZINC | ||
14. write tweet, etc. announcing, if appropriate. | 14. write tweet, etc. announcing, if appropriate. | ||
Line 86: | Line 111: | ||
[[Category:Internal]] | [[Category:Internal]] | ||
[[Category:Loading]] | |||
[[Category:Sysadmin]] |
Revision as of 16:34, 6 November 2013
This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting.
YYZ protocol
1. Acquire catalog, often as SDF
- This is often a manual step, by email, or download.
- pc2unix to remove \r
2. Parse SDF into ISM, harvesting data from SD tags into synonyms table
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv
3. Deplete
previously was: deplete.pl ibsbb < ibsbb.ism now??
4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log ignorenstereochem2.csh ibsbbexp.ism sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism
5. protonation step with chemaxon!
cxcalc ... to generate a canonical form at pH 7.0
6. look up if exists in database (eventually load catalog_item, substance tables
python find_new_substances.py ibsbb ibsbb.ism catalog-item.csv
7. generate protomers
cxcalc... dahlia procedure
8. create db, db2, mol2, sdf, pdbqt, solv
ryan's procedure
Load ChEMBL for ZINC, SEA export
get latest chembl
load into psql
export to files for SEA
create links between substance, annotation via note
target clustering
update SEA
UCSF protocol (OLD)
Acquire databases
- 1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible
- 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory
pc2unix '*.sdf'
- 3. make ism
foreach i (*.sdf) namesdf.pl '<TAG>' < $i > $i.sdf convert.py --i=$i.sdf --o=$i.ism
- 4. Combine and move
cat vendoris*.ism > all mv all ~xyz/raid8/catalog/vendorid.in ln -s !$ vendorid.ism
- 5. There is no 5.
oh yeah?
- 6. Mark depleted
deplete.pl vendorid < vendorid.ism
- 7. Process on sgehead2
mas.csh vendorid vendorid.ism ; # nb may run for a long time!
nb periodically delete output
- 8. Update filter info, error info
- 9. export database on nfshead5
cd ~xyz/raid8/byvendor/.temp mkdir vendorid cd vendorid callgr17.csh
(may take a long time)
- 10. export database on nfshead5
extractthis.csh vendorid nodb mol2
NB db if annotated mol2 in all cases
- 11. if big, cluster on korn
kornit.csh vendorid `pwd`
12. finish off on wilco
./all.csh updateit.pl < log ./all.csh cd vendorid dosubset4.pl vendorid vendor
13. email vendor telling them their catalog has been updated in ZINC
14. write tweet, etc. announcing, if appropriate.