Many people are interested in metabolites, including purchasable metabolites. We get emails such as this one:
We are looking into HMDB for GPCR deorphanization tasks, and for practical purposes want to focus on purchasable molecules only. I was wondering if it is possible to download a purchasable subset of HMDB (such that each molecule is purchasable (but without molecules that are metabolites of purchasable ones, but not purchasable on their own)
We begin by doing the best we can on the question as asked, and then broaden the question:
To get the purchasable compounds in HMDB, you would use:
To download these in SMILES, you would just add .smi to the end, and say you want all of them, thus:
Unfortunately, we do not keep track of which compounds are metabolites of other molecules. If you have access to such a mapping, let us know and we can help you do the filtering you request.
Turning to the questions more broadly raised in the email, we imagine our correspondent might be using "HMDB" as a proxy for "endogenous human metabolites". In fact, HMDB is a very powerful resource that also includes drug metabolites, plant and food metabolites, among other xenobiotics. Fortunately, HMDB does break these different classes out, and we have categorized them separately in ZINC.
Thus, to access only endogenous human metabolites from HMDB in ZINC, use:
We wondered whether our reader might be interested in endogenous human metabolites more generally, not just those in HMDB. Here, we suggest using the ZINC endogenous subset, thus:
To obtain a thousand of these in SDF form, use:
To find out which catalogs contribute to the endogenous subset of ZINC, use:
If you believe a catalog has been incorrectly curated, please write us!
Naturally, this analysis can be extended to other catalogs
and catalog subsets