ZINC26:Developer

From DISI
Jump to navigation Jump to search

These are notes for the developers of ZINC26.


  • root directory is /nfs/exk/newdbpublic
  • Bioact - bioactives directory. These are sourced from public websites and curated manually.
  • Purch - purchasable chemistry. Sourced from public sources and curated manually.
  • We generate smallworld and arthor indexes for all of the above directories. Stored on /nfs/db5/newdb/
  • Purchbioact - this directory is entirely computed
    • first pass: for each molecule in each bioactive catalog, we search it in each purchasable catalog.
    • if there are no hits, it gets deleted.
    • files are of the form <mol_id>.<cat_id>.txt
    • second pass: In each directory, where molecules live (.smi) for each molecule e.g. HMDB0001881
    • we generate a report e.g. in /nfs/exk/newdbpublic/Purchbioact/invivo/hmdb/hmdbcosmetic/cc we run
    • python /nfs/exk/newdbpublic/Purchbioact/bin/script2a.py HMDB0037790
    • (try script1 also. try script2 also. still working out the kinks.)
    • it is these reports that I want to form the basis of a "molecule detail report"
    • third pass: Once we generate a molecule detail report for every bioactive molecule, I want to prepare
    • summary reports that give overall answers for all bioactive catalogs.
    • there is more, but this is a good start.
    • the calculations of the first pass are still running, but there is now tons of data to process
    • I tried to break up sets of molecules small enough that we never have more than 5000 mols in one directory.
    • but I have not been completely successful. I am still working on that.


  • In Bioactive, in addition to "inVitro", "inVivo", "natural", there is also "user"
  • the idea is that people can upload a set of molecules by date and name to "user"
  • then overnight, or following some cron job schedule, we compute exactly the same reports as we do for bioactive compounds from the literature for compounds that users upload.


[[Category:ZINC26]