Loading ZINC12: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 11: | Line 11: | ||
== 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table == | == 2. Parse SDF into ISM, harvesting data from SD tags into synonyms table == | ||
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv | python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv | ||
code to edit field table | |||
code to load csv into synonyms | |||
== 3. Deplete == | == 3. Deplete == |
Revision as of 17:28, 6 November 2013
This is the internal page for loading ZINC. If you are not a ZINC curator, this page will probably not be interesting.
YYZ protocol
1. Acquire catalog, often as SDF
- This is often a manual step, by email, or download.
- pc2unix to remove \r
2. Parse SDF into ISM, harvesting data from SD tags into synonyms table
python parse_catalog.py ibsbb ibs2013oct_bb.sdf ibsbb.ism ibsbb.csv code to edit field table code to load csv into synonyms
3. Deplete
previously was: deplete.pl ibsbb < ibsbb.ism now??
4. Desalt, filter out neverwants, stereochemical expansion, remove N stereochem, take 4 max
filter.py rules.txt ibsbb.ism ibsbbexp.ism > ibsbb.log ignorenstereochem2.csh ibsbbexp.ism sort -u ibsbbexp.ism | sort -k 2 | only4.pl > ibsbbexp4.ism
5. protonation step with chemaxon!
cxcalc ... to generate a canonical form at pH 7.0
6. look up if exists in database (eventually load catalog_item, substance tables
python find_new_substances.py ibsbb ibsbb.ism catalog-item.csv
7. generate protomers
cxcalc... dahlia procedure
8. create db, db2, mol2, sdf, pdbqt, solv
ryan's procedure
Load ChEMBL for ZINC, SEA export
get latest chembl
load into psql
export to files for SEA
create links between substance, annotation via note
target clustering
update SEA
UCSF protocol (OLD)
Acquire databases
- 1. Get the catalog from the vendor, usually in SDF. NB. we need to automate this step as much as possible
- 2. On nfshead2 in ~xyz/raid3/stage3/ or /raid6/tmp/xyz/ gunzip SDF into a directory
pc2unix '*.sdf'
- 3. make ism
foreach i (*.sdf) namesdf.pl '<TAG>' < $i > $i.sdf convert.py --i=$i.sdf --o=$i.ism
- 4. Combine and move
cat vendoris*.ism > all mv all ~xyz/raid8/catalog/vendorid.in ln -s !$ vendorid.ism
- 5. There is no 5.
oh yeah?
- 6. Mark depleted
deplete.pl vendorid < vendorid.ism
- 7. Process on sgehead2
mas.csh vendorid vendorid.ism ; # nb may run for a long time!
nb periodically delete output
- 8. Update filter info, error info
- 9. export database on nfshead5
cd ~xyz/raid8/byvendor/.temp mkdir vendorid cd vendorid callgr17.csh
(may take a long time)
- 10. export database on nfshead5
extractthis.csh vendorid nodb mol2
NB db if annotated mol2 in all cases
- 11. if big, cluster on korn
kornit.csh vendorid `pwd`
12. finish off on wilco
./all.csh updateit.pl < log ./all.csh cd vendorid dosubset4.pl vendorid vendor
13. email vendor telling them their catalog has been updated in ZINC
14. write tweet, etc. announcing, if appropriate.