Data Sources for ZINC15

From DISI
Revision as of 18:50, 18 June 2015 by Frodo (talk | contribs) (asdf)
Jump to navigation Jump to search

ZINC obviously incorporates data from many vendor catalogs and annotated databases. We also use some data in particular ways.

HMDB

  • breaks out endogenous and other levels of specificity , which we load in ZINC as separate catalogs.

ATCcodes

  • currently we get them from ChEMBL, but this is not updated fast enough
  • another source is drugbank XML , which does seem to be regularly updated
  • a third way is directly from WHO (Norway). We currently do not do this.

DrugBank

  • we parse out FDA approved separately
  • we parse out each of the subsets (street, experimental, etc)

ChEMBL

  • target affinitites of compounds, 10uM or better
  • ATC codes
  • two levels of protein hierarchy classification (major class and sub class in ZINC15)

Co-expression data

proteins whose expression is highly correlated are more likely to be related We get this from Matt, who in turn gets it from collaborators. Figure out how to cite this properly here.


UniProt

Why? Examples. We use uniprot to translate uniprot assessing codes to gene names (which is how we unify from different species, and also how we interact with ICGC)


Protein-protein interactions with BioGRID

  • we got the file from Matt, who simply downloaded it.
  • URL goes here.
  • update frequency?

SEA predictions

  • we calculate these ourselves
  • script here.


BLAST of NR

  • we got the script from Matt
  • we adapted it ourselves to just those genes for which ligands are available.

[[Category:ZINC15]