ZINC26:Developer
Jump to navigation
Jump to search
These are notes for the developers of ZINC26.
- root directory is /nfs/exk/newdbpublic
- Bioact - bioactives directory. These are sourced from public websites and curated manually.
- Purch - purchasable chemistry. Sourced from public sources and curated manually.
- We generate smallworld and arthor indexes for all of the above directories. Stored on /nfs/db5/newdb/
- Purchbioact - this directory is entirely computed
- first pass: for each molecule in each bioactive catalog, we search it in each purchasable catalog.
- if there are no hits, it gets deleted.
- files are of the form <mol_id>.<cat_id>.txt
- second pass: In each directory, where molecules live (.smi) for each molecule e.g. HMDB0001881
- we generate a report e.g. in /nfs/exk/newdbpublic/Purchbioact/invivo/hmdb/hmdbcosmetic/cc we run
- python /nfs/exk/newdbpublic/Purchbioact/bin/script2a.py HMDB0037790
- (try script1 also. try script2 also. still working out the kinks.)
- it is these reports that I want to form the basis of a "molecule detail report"
- third pass: Once we generate a molecule detail report for every bioactive molecule, I want to prepare
- summary reports that give overall answers for all bioactive catalogs.
- there is more, but this is a good start.
- the calculations of the first pass are still running, but there is now tons of data to process
- I tried to break up sets of molecules small enough that we never have more than 5000 mols in one directory.
- but I have not been completely successful. I am still working on that.
- In Bioactive, in addition to "inVitro", "inVivo", "natural", there is also "user"
- the idea is that people can upload a set of molecules by date and name to "user"
- then overnight, or following some cron job schedule, we compute exactly the same reports as we do for bioactive compounds from the literature for compounds that users upload.
[[Category:ZINC26]