Data Sources for ZINC15: Difference between revisions

From DISI
Jump to navigation Jump to search
(asdf)
(asdf)
 
Line 20: Line 20:
== Co-expression data==
== Co-expression data==
proteins whose expression is highly correlated are more likely to be related  
proteins whose expression is highly correlated are more likely to be related  
We get this from Matt, who in turn gets it from collaborators.  
We get this from Matt, who in turn gets it from the Gillis Lab as CoExpNet.csv.
Figure out how to cite this properly here.
Figure out how to cite this properly.
How often might it be updated?




== UniProt ==  
== UniProt ==  
Why?  Examples.
We use Uniprot to translate swissprot/uniprot accession codes in ChEMBL to Uniprot gene symbols, thus e.g. 5HT1A_HUMAN becomes HTR1A.
We use uniprot to translate uniprot assessing codes to gene names (which is how we unify from different species, and also how we interact with ICGC)
This is how we unify observations from different species, and also how we intersect with ICGC DCC.
 


== Protein-protein interactions with BioGRID ==
== Protein-protein interactions with BioGRID ==
Line 42: Line 42:
* we got the script from Matt
* we got the script from Matt
* we adapted it ourselves to just those genes for which ligands are available.
* we adapted it ourselves to just those genes for which ligands are available.
guidetopharmacology -
contains metabolites and drugs and targets.
We think it is already mostly incorporated into chembl
but the metabolite and drug part is not clear to use, thus we use it directly.


[[Category:ZINC15]
[[Category:ZINC15]
[[Category:Reference]]
[[Category:Reference]]

Latest revision as of 14:50, 21 June 2015

ZINC obviously incorporates data from many vendor catalogs and annotated databases. We also use some data in particular ways.

HMDB

  • breaks out endogenous and other levels of specificity , which we load in ZINC as separate catalogs.

ATCcodes

  • currently we get them from ChEMBL, but this is not updated fast enough
  • another source is drugbank XML , which does seem to be regularly updated
  • a third way is directly from WHO (Norway). We currently do not do this.

DrugBank

  • we parse out FDA approved separately
  • we parse out each of the subsets (street, experimental, etc)

ChEMBL

  • target affinitites of compounds, 10uM or better
  • ATC codes
  • two levels of protein hierarchy classification (major class and sub class in ZINC15)

Co-expression data

proteins whose expression is highly correlated are more likely to be related We get this from Matt, who in turn gets it from the Gillis Lab as CoExpNet.csv. Figure out how to cite this properly. How often might it be updated?


UniProt

We use Uniprot to translate swissprot/uniprot accession codes in ChEMBL to Uniprot gene symbols, thus e.g. 5HT1A_HUMAN becomes HTR1A. This is how we unify observations from different species, and also how we intersect with ICGC DCC.

Protein-protein interactions with BioGRID

  • we got the file from Matt, who simply downloaded it.
  • URL goes here.
  • update frequency?

SEA predictions

  • we calculate these ourselves
  • script here.


BLAST of NR

  • we got the script from Matt
  • we adapted it ourselves to just those genes for which ligands are available.

guidetopharmacology - contains metabolites and drugs and targets. We think it is already mostly incorporated into chembl but the metabolite and drug part is not clear to use, thus we use it directly.


[[Category:ZINC15]