ZINC15:examples

From DISI
Revision as of 23:41, 1 October 2015 by Frodo (talk | contribs) (asdf)
Jump to navigation Jump to search

Here are examples of what you can do with ZINC15, organized by

  • a) whether our solution is web accessible (public) or requires ssh access (private)

and

Please see also ZINC15:videos for discussions of various topics and ZINC15:examples:screening and ZINC15:examples:resources for more examples.

Public Access

ZINC15:examples:public


Private access. requires ssh access or a private ZINC copy

Level 4 - privileged access

Ask us if you require this level of access to ZINC.

Level 5 - command line ZINC shell

Level 6 - SQL

How to calculate results from the ZINC15 2015 paper:

Table 2 - Genes, Uniprot codes and annotated compounds by affinity bin ==

  • * select organism_fk, count(gene_id) from gene group by organism_fk;
zinc15.docking.org/genes.txt:count(*)?group-by=organism_fk
  • + select g.organism_fk, count(a.anno_id) from annotation a, gene g where a.gene_fk = g.gene_id group by g.organism_fk;
zinc15.docking.org/targets.txt:count(*)?group-by=gene.organism_fk
  • $ select count(distinct(n.sub_id_fk)),g.organism_fk from gene g, annotation a, note n where g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 9 group by g.organism_fk;
zinc15.docking.org/  not sure we can do this!

Table 3 - Gene Target Classes

  • * select count(g.gene_id),tc.major_class from target_class tc, gene g where g.target_class_fk = tc.target_class_id group by tc.major_class order by tc.major_class
  • + select count(distinct(s.sub_id)),tc.major_class from gene g, substance s, note n, annotation a, target_class tc where g.gene_id = a.gene_fk and g.target_class_fk = tc.target_class_id and n.anno_id_fk = a.anno_id and s.sub_id = n.sub_id_fk and s.purchasability >20, 20, 10 group by tc.major_class
  • $ select count(distinct(g.gene_id)), tc.major_class from target_class tc, gene g, substance s, note n, annotation a where s.sub_id = n.sub_id_fk and a.anno_id = n.anno_id_fk and a.gene_fk = g.gene_id and g.target_class_fk = tc.target_class_id and s.purchasability > 9 group by tc.major_class;

Table 4 - Chemical Diversity by Gene

4) histogram of clusters per gene.

select c.gene_fk, g.name, count(c.cluster_id) from cluster c, gene g where g.gene_id = c.gene_fk and c.ordinal = 1 group by c.gene_fk,g.name order by count(c.cluster_id) desc;

for any one pick the smiles:

select smiles,sub_id from substance where sub_id in (select rep_sub_id_fk from cluster where gene_fk = 57387 and ordinal=1);

another

select c.gene_fk, g.name, count(c.cluster_id) from cluster c, gene g where g.gene_id = c.gene_fk and c.ordinal = 1 group by c.gene_fk,g.name order by count(c.cluster_id) desc;

another

select gene_fk, count(cluster_id) from cluster where ordinal = 1 group by gene_fk order by count(cluster_id) desc ;


Table 5 - Genes per compound as a function of affinity cutoff (polypharmacology) this is almost for sure wrong:

select sub_id  count(sub_id_fk) from note as n join annotation as a on n.anno_id_fk = a.anno_id join gene as g on g.gene_id = a.gene_fk group by n.sub_id_fk having min(n.affinitynM) >= 9;

NEW

Non purch:

select count(sub_id_fk) as num_compounds, cgi as having_num_genes from (select n.sub_id_fk as sub_id_fk , count(distinct(g.gene_id)) as cgi from gene g, annotation a, note n where g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 9 group by n.sub_id_fk) as t group by cgi order by cgi;

Heavy hitters:

select n.sub_id_fk as sub_id_fk , count(distinct(g.gene_id)) as cgi from gene g, annotation a, note n where g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 9 group by n.sub_id_fk order by cgi desc limit 10;

Purch:

select count(sub_id_fk) as num_compounds, cgi as having_num_genes from (select n.sub_id_fk as sub_id_fk , count(distinct(g.gene_id)) as cgi from gene g, annotation a, note n, catalog_item ci, catalog c  where c.cat_id=ci.cat_id_fk and ci.sub_id_fk = n.sub_id_fk and g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 8 and c.purchasable>9 group by n.sub_id_fk) as t group by cgi order by cgi;

Heavy hitters (purch):

select n.sub_id_fk as sub_id_fk , count(distinct(g.gene_id)) as cgi from gene g, annotation a, note n, catalog_item ci, catalog c  where c.cat_id=ci.cat_id_fk and ci.sub_id_fk = n.sub_id_fk and g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 8 and c.purchasable>9 group by n.sub_id_fk

Level 7 - Python+SQL

Level 8 - Questions we cannot answer (yet)

Examples mentioned in or implied by the 2015 paper

NEW TABLE 8

chemical warheads and the genes they hit as a function of affinity.

http://zincapi.ucsf.bkslab.org/v1/activities/txt:target.gene.name,substance.smiles,substance.zinc_id,substance.purchasability/activity.affinitynm ge 7&substance.structure%20contains%20cC(=O)[OD1]?count=all

Table 8B

http://zincapi.ucsf.bkslab.org/v1/substances/txt:smiles,zinc_id/substance.features%20contains%20211,
http://zincapi.ucsf.bkslab.org/v1/substances/txt:smiles,zinc_id/substance.features%20contains%20212,
http://zincapi.ucsf.bkslab.org/v1/substances/txt:smiles,zinc_id/substance.features%20contains%20213,

&substance.purchasability >20 =20, =10, >1.

Figure 1

Drawn. probably to delete.

Figure 2

Non-purch final:

psql -h samekh -U test zinc15 -c "select count(distinct(n.sub_id_fk)) as num_compds, g.name as gene_name from gene g, annotation a, note n where g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 5 and g.organism_fk = 1 group by g.name " > 1-10um.txt

Purch final:

psql -h samekh -U test zinc15 -c "select count(distinct(n.sub_id_fk)) as num_compds, g.name as gene_name from gene g, annotation a, note n, catalog_item ci, catalog c  where c.cat_id=ci.cat_id_fk and ci.sub_id_fk = n.sub_id_fk and g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 5 and g.organism_fk = 2 and c.purchasable > 9  group by g.name" > 2-10um-purch.txt

Unknown

http://api.docking.org/v1/activities/txt:target.gene.name,sub_id_fk,affinitynm,substance.best_purchasbility

New non-purch

select count(distinct(n.sub_id_fk)) as num_compds, g.name as gene_name from gene g, annotation a, note n where g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 6 and g.organism_fk = 1 group by g.name; (by organism x 4)

Purch:

select count(distinct(n.sub_id_fk)) as num_compds, g.name as gene_name from gene g, annotation a, note n, catalog_item ci, catalog c  where c.cat_id=ci.cat_id_fk and ci.sub_id_fk = n.sub_id_fk and g.gene_id = a.gene_fk and a.anno_id = n.anno_id_fk and n.affinitynm >= 8 and g.organism_fk = 2 and c.purchasable > 9  group by g.name  limit 10;

Figure 3

Actions

Translate

zinc15.docking.org/actions/translate

Nearest Metabolite

zinc15.docking.org/actions/translate

Build dockable library

zinc15.docking.org/actions/build-library

One-step reactions

zinc15.docking.org/actions/reaction
alter user zincfree set search_path = "$user", free, public;


Get supplier_codes for chembl20 compounds that match PAINS

wget -O- 'http://zinc15.docking.org/catalogs/chembl20/substances.json:zinc_id+catalog_items+patterns?patterns-any-origin_name=pains&count=all' | jq -r '[.zinc_id, (.patterns[] | select(.origin_name == "pains") | .name), (.catalog_items[] | select(.catalog_short_name == "chembl20") | .supplier_code)] | @csv' | csv --to-tsv | tee chembl-patterns.txt

substance-oriented

gene-oriented

Compounds that are similar to dopamine and are endogenous human metabolites

http://zinc15.docking.org/substances/subsets/endogenous.txt:smiles+zinc_id+tanimoto_similarity?ecfp4_fp-tanimoto=c1cc(c(cc1CC[NH2])O)O

OR

http://zinc15.docking.org/substances/subsets/endogenous.txt:smiles+zinc_id+tanimoto_similarity?ecfp4_fp-tanimoto=33882

Similar to (drug) and has been in man

http://zinc15.docking.org/substances/subsets/in-man.txt:smiles+zinc_id+tanimoto_similarity?ecfp4_fp-tanimoto=c1cc2c(cc1O)C[C@H](CC2)N
http://zinc15.docking.org/genes

list of genes and their major classes

http://zinc15.docking.org/genes/txt:name%2Cdescription%2Ctarget_class.name%2Ctarget_class.major_target_class.name/

list of target classes

http://zinc15.docking.org/targetclasses.txt:name

list of major target classes

http://zinc15.docking.org/majorclasses.txt:name

enzyme genes

http://zinc15.docking.org/genes.txt:name%2Cdescription+target_class.name?majorclass.name=enzyme

http://zinc15.docking.org/targets.txt:gene.name+majorclass.name%20eq%20enzyme

what it does

.../substance.structure similarto c1ccccc1CCCOC within .7 using tanimoto

blah

/substance/txt:zinc_id/substance isa metabolite

asdfasdf

/substances/txt:zinc_id?sort=substance.structure+near+c1ccccc1CCNC+using+dice

The organism table (modified from ChEMBL)

http://zinc15.docking.org/organisms/txt:organism_id,code,description/

Table of Dopamine receptor genes

http://zinc15.docking.org/targets.txt:name,description,gene.description,uniprot,chembl,gene_fk,gene.name?gene.name-startswith=DRD

List of chemotype clusters (not too useful, but proof of concept)

http://zinc15.docking.org/clusters.txt:number+gene.name

All compound-target pairs with activity of 1nM or better

http://zinc15.docking.org/activities.txt:substance.zinc_id+substance.best_purchasability+substance.best_catalog+target.name+affinitynm?activity.affinitynm-gt=9

Search by inchikey prefix

translator:

http://zinc15.docking.org/apps/mol/convert?from=c1ccccc1&to=inchikey

blah

http://zinc15.docking.org/substances?references-any-pubmed_id=14695814


http://zinc15.docking.org/substances/?inchikey-startswith=LFQSCWFLJHTTHZ

Search by inchikey

http://zinc15.docking.org/substances/?inchikey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N

Endogenous and Drug metabolites (ZINC, a new method to find problems in other databases)

http://zinc15.docking.org/catalogs/hmdbendo/substances/?catalogs-any-short_name=hmdbdrug

FDA approved but not in drugstore, generally this is due to stereochemistry issues in drugbank.

http://zinc15.docking.org/catalogs/dbfda/substances/?~catalogs-any-short_name=drugstore


first one: http://zinc15.docking.org/catalogs/lipid/substances/?~catalogs-any-short_name=hmdb
second one: (correct): http://zinc15.docking.org/catalogs/lipid/substances/?catalogs-any-short_name=lipid

blah

http://zinc15.docking.org/catalogs/foodb/substances/subsets/endogenous

Question: what compounds are in the "lipidomics" catalog but are NOT in HMDB (as we currently have it loaded):

http://zinc15.docking.org/catalogs/lipid/substances/?~catalogs-any-short_name=hmdb

thus, compounds sold by sigma aldrich that hit DRD2 and not HRH1 pace ChEMBL,10uM.

http://zinc15.docking.org/catalogs/sial/substances/?genes-any-name=DRD2&~genes-any-name=HRH1

Only one catalog, and it is sial:

http://zinc15.docking.org/catalogs/sial/substances/?~catalogs-any-short_name-ne=sial

We would like a variant for no other catalog of type 50,40,20,10 only. That's a request.

Metabolites that look like ethanol:

http://zinc15.docking.org/substances/subsets/metabolites/?ecfp4_fp-tanimoto-50=CCO

(produced in the gut by the bacterial flora)

Get fingerprint

http://zinc15.docking.org/substances/ZINC000018011573.txt:ecfp4_fp.as_base64

How many molecules are in pubchem?

http://zinc15.docking.org/catalogs/pubchem/substances?special=count

blah

http://zinc15.docking.org/catalogs/pubchem/substances?special=count-wait

blah

http://zinc15.docking.org/majorclasses/?substances-any-purchasability=for-sale 

blah

http://zinc15.docking.org/majorclasses/?substances-any-purchasability=not-for-sale

blah

http://zinc15.docking.org/majorclasses/?substances-memberof=biogenic