ZINC subsets: Difference between revisions
Line 50: | Line 50: | ||
We offer the capability to create small subsets. You can do this via the ZINC results browser page after a search by clicking on "create subset". Another way, if you have the SMILES, to generate a custom subset, is to upload the molecules. | We offer the capability to create small subsets. You can do this via the ZINC results browser page after a search by clicking on "create subset". Another way, if you have the SMILES, to generate a custom subset, is to upload the molecules. | ||
= How to create a subset for docking based on SMILES = | == How to create a subset for docking based on SMILES == | ||
* 1. Browse to the upload page, http://zinc.docking.org/upload.shtml | * 1. Browse to the upload page, http://zinc.docking.org/upload.shtml | ||
* 2. select your files with one smiles and optional identifier separated by whitespace per line | * 2. select your files with one smiles and optional identifier separated by whitespace per line |
Revision as of 19:59, 4 September 2008
ZINC is big. Currently, ZINC has over 12M molecules, about 9M of which are commercially available. For most applications, most users of ZINC will only want or need to download a fraction of ZINC: a subset. This article describes subsets.
Property Subsets
Subsets of ZINC by one dimensional physical property (molecular weight, calculated logP) are the single most popular way to acquire ZINC. Of these, the first two subsets, "lead-like" (subset #1) and "fragment-like" (subset #2) are by far the most popular. There are good reasons for this.
lead-like
Lead-like compounds are large enough to be detected in high throughput spectrophotometric or other cheap assays, yet smaller than most drugs, which have been highly optimized for a specific application. Lead-like compounds will be more soluble, in general, than their bigger "drug like" cousins, and thus more likely to actually be assayed.
fragment like
Fragment-like compounds are even smaller than leads. The good news is, they sample chemical space more throughly than is possible with leads. The bad news is, they are often too small to be detected in a cheap assay, requiring direct biophysical measurement, such as SPR, NMR, or X-ray crystallography.
Together, leads and fragments represent the dominant thinking in the field for screening. The remaining subsets can also be interesting. Here we give a brief explanation of why you might want each one.
drug like
Drug-like (#3) captures the famous rule-of-fives, which itself is just a guideline, to which there are many exceptions. There will be times you may want to screen the "drug like" subset of ZINC, but this would probably be later in the project, after you have had a good look at the leads already, or perhaps there is some unusual circumstance.
greasy
Greasy-leads (#4) and Big-n-greasy(#5) are deprecated. Frankly, these compounds are nothing but trouble, since they often do not dissolve. If you really want them back, write me, but otherwise, they are gone.
everything subsets
All purchasable (#6) comes in third place for popularity. Advantage: you can buy these compounds. Disadvantage: for target based virtual screening, many of these compounds will be a waste of time, because they are too big, too specific, and too greasy (insoluble).
Subsets 7,8,9 will return soon...
Everything (#10) comes in fourth place for popularity, since it is, well, everything we can let you have. We frankly don't think you really want this, but people keep asking for it, so, here it is.
Subsets 11-16 will return soon....
fragment variations
Neutral-fragments (#17) are what the name suggests: uncharged fragments. Why would you want this? Charged compounds often have a hard time getting into cells. Docking programs can have trouble weighting among charged and neutral compounds. Wham - put those ideas together and you see why neutral fragments can be interesting.
Subsets 18-28 will return soon....
CNS permeable (#29) are of interest for some projects where getting through the BBB is important. We have used well known criteria for this subset.
Monoanions (#31) and monocations (#32) - don't know why you would want this. We created this for a particular project.
Goldilocks (#33) are yet another set that try to "shoot for the middle" of the chemical space problem and balance the competing advantages and disadvantages of bigger vs smaller molecules.
personal subsets
Piotr (#38), kerim-like (#42), abram (#49) - were all created for specific projects - we do not know why you might want these, but they are available should that be the case.
research subsets
stiff-soluble (#50) and stiffs (#51) are for testing ideas about entropy loss of the ligand on binding. So they are for research, but you might want them too...
Vendor Subsets
We offer subsets by vendor.
User-created subsets of mini subsets
We offer the capability to create small subsets. You can do this via the ZINC results browser page after a search by clicking on "create subset". Another way, if you have the SMILES, to generate a custom subset, is to upload the molecules.
How to create a subset for docking based on SMILES
- 1. Browse to the upload page, http://zinc.docking.org/upload.shtml
- 2. select your files with one smiles and optional identifier separated by whitespace per line
- 3. Check box "click here if private (UCSF only)". This gives you the right to upload 5000 instead of 1000 molecules per transaction. If you are not inside UCSF or otherwise "special", you are stuck at 1000.
- 4. Click "upload and build"
- 5. Click on the link where it says "browse results here" (should be a number)
- 6. Wait about 10 min per 1000 molecules (refresh page) until you see "e_0.0.db.gz" - this is the pH 7 representation of your compounds. e_1.0.db.gz is the "additional forms pH 5.75 - 8.25" file. you need to download/copy these files and dock them.
- 7. OR, you can do: md4db.csh uploads 44249 (or whatever your number was) and it will copy them and set up for docking. (UCSF only). md4db.csh == make database for dock blaster
- 8. Now do cd run.u44249
- 9. Now do startdockbks3 `pwd` on sgehead to start docking.
User-uploaded subsets
We offer the capability to upload compounds for processing.
By Annotation
We offer compounds by annotation. more soon.
Synthesis on Request
Some vendors offer compounds that they will make if asked, usually within about 10 weeks. We like these compounds, because they greatly expand the region of chemical space one can sample without performing synthesis oneself.
-- John Irwin