ZINC subsets

From DISI
Revision as of 19:09, 17 February 2009 by Frodo (talk | contribs)
Jump to navigation Jump to search

ZINC is big. Currently, ZINC has over 12M molecules, about 9M of which are commercially available. For most applications, most users of ZINC will only want or need to download a fraction of ZINC: a subset. This article describes subsets.

Subset classes

  • Standard subsets, numbers 1-10. These are our approximations of popular subsets that appear commonly in the literature
  • clean subsets, numbers 11-20. Here we have removed compounds that are known to cause problems in some assays. For information about the additional filtering used, please see [1]
  • immediate availability subsets, numbers 21-30. Normal subsets include compounds that are available from stock, and also compounds that can be made in 6-10 weeks. Some people only want compounds that can be obtained immediately, say less than 2 weeks. If so, this subset is for you.


Property Subsets

Subsets of ZINC by one dimensional physical property (molecular weight, calculated logP) are the single most popular way to acquire ZINC. Of these, the first two subsets, "lead-like" (subset #1) and "fragment-like" (subset #2) are by far the most popular. There are good reasons for this.

lead-like

Lead-like compounds are large enough to be detected in high throughput spectrophotometric or other cheap assays, yet smaller than most drugs, which have been highly optimized for a specific application. Lead-like compounds will be more soluble, in general, than their bigger "drug like" cousins, and thus more likely to actually be assayed.

fragment like

Fragment-like compounds are even smaller than leads. The good news is, they sample chemical space more throughly than is possible with leads. The bad news is, they are often too small to be detected in a cheap assay, requiring direct biophysical measurement, such as SPR, NMR, or X-ray crystallography.

Together, leads and fragments represent the dominant thinking in the field for screening. The remaining subsets can also be interesting. Here we give a brief explanation of why you might want each one.

drug like

Drug-like (#3) captures the famous rule-of-fives, which itself is just a guideline, to which there are many exceptions. There will be times you may want to screen the "drug like" subset of ZINC, but this would probably be later in the project, after you have had a good look at the leads already, or perhaps there is some unusual circumstance.

greasy

Greasy-leads (#4) and Big-n-greasy(#5) are deprecated. Frankly, these compounds are nothing but trouble, since they often do not dissolve. If you really want them back, write me, but otherwise, they are gone.

everything subsets

All purchasable (#6) comes in third place for popularity. Advantage: you can buy these compounds. Disadvantage: for target based virtual screening, many of these compounds will be a waste of time, because they are too big, too specific, and too greasy (insoluble).

Subsets 7,8,9 will return soon...

Everything (#10) comes in fourth place for popularity, since it is, well, everything we can let you have. We frankly don't think you really want this, but people keep asking for it, so, here it is.

Subsets 11-16 will return soon....

fragment variations

Neutral-fragments (#17) are what the name suggests: uncharged fragments. Why would you want this? Charged compounds often have a hard time getting into cells. Docking programs can have trouble weighting among charged and neutral compounds. Wham - put those ideas together and you see why neutral fragments can be interesting.

Subsets 18-28 will return soon....

CNS permeable (#29) are of interest for some projects where getting through the BBB is important. We have used well known criteria for this subset.

Monoanions (#31) and monocations (#32) - don't know why you would want this. We created this for a particular project.

Goldilocks (#33) are yet another set that try to "shoot for the middle" of the chemical space problem and balance the competing advantages and disadvantages of bigger vs smaller molecules.

personal subsets

Piotr (#38), kerim-like (#42), abram (#49) - were all created for specific projects - we do not know why you might want these, but they are available should that be the case.

research subsets

stiff-soluble (#50) and stiffs (#51) are for testing ideas about entropy loss of the ligand on binding. So they are for research, but you might want them too...

Vendor Subsets

We offer subsets by vendor.

User-created subsets of mini subsets

We offer the capability to create small subsets. You can do this via the ZINC results browser page after a search by clicking on "create subset". Another way, if you have the SMILES, to generate a custom subset, is to upload the molecules.

How to create a subset for docking based on SMILES

  • 1. Browse to the upload page, http://zinc.docking.org/upload.shtml
  • 2. select your files with one smiles and optional identifier separated by whitespace per line
  • 3. Check box "click here if private (UCSF only)". This gives you the right to upload 5000 instead of 1000 molecules per transaction. If you are not inside UCSF or otherwise "special", you are stuck at 1000.
  • 4. Click "upload and build"
  • 5. Click on the link where it says "browse results here" (should be a number)
  • 6. Wait about 10 min per 1000 molecules (refresh page) until you see "e_0.0.db.gz" - this is the pH 7 representation of your compounds. e_1.0.db.gz is the "additional forms pH 5.75 - 8.25" file. you need to download/copy these files and dock them.
  • 7. OR, you can do: md4db.csh uploads 44249 (or whatever your number was) and it will copy them and set up for docking. (UCSF only). md4db.csh == make database for dock blaster
  • 8. Now do cd run.u44249
  • 9. Now do startdockbks3 `pwd` on sgehead to start docking.

User-uploaded subsets

We offer the capability to upload compounds for processing.

By Annotation

We offer compounds by annotation. more soon.

Synthesis on Request

Some vendors offer compounds that they will make if asked, usually within about 10 weeks. We like these compounds, because they greatly expand the region of chemical space one can sample without performing synthesis oneself.

-- John Irwin