The Tranche Browser is a new features of ZINC15 that allows you to download physical property subsets of ZINC easily.
It allows you to download subsets in 2D (for chemoinformatics as SMILES or with properties in text format) and 3D (ready for docking formats). Switch between 2D and 3D views using the 2D/3D button (top left).
The choices to be made are as follows:
The reactivity filter allows compounds to be filtered by the chemical functionality the possess. By default, "Mild" is selected, which was the default in ZINC12. This level allows molecules containing PAINS patterns as well as weakly reactive groups such as aldehydes, thiols and Michael acceptors. The user may choose to impose stricter or less strict chemistry filters. Thus, "Clean" filters out the same functional groups as in the ZINC12 "clean" subsets, such as aldehydes, thiols, and Michael acceptors. Notably, it allows PAINS patterns to pass for which their is no clear mechanism. To eliminate PAINS compounds completely, the user should select the "Anodyne" reactivity level. On the other end of the spectrum, the user may be more permissive, by allowing more reactive groups such as alkyl halides and ???. A final level, unstable, includes boronic acids and other groups that would be expected to react or disintegrate in buffer and thus are not expected to survive in an assay. They may still be interesting for other applications, including synthetic or general chemoinformatics work. The filtering is by default progressive, and thus if Mild is selected, then Clean and Anodyne levels are also selected. To select explicitly only compounds that are at a particular level, toggle into "Exclusive" mode in the reactivity popup.
The purchasability filter allows compounds to be selected by their availability. By default, "Wait OK" is selected, meaning both in stock and on demand compounds are included, which was also the default in ZINC12. If the research is on a tight deadline, two stricter levels may be specified. "In stock" refers to compounds that exist as powders and are ready to ship, and can typically be expected within 2 weeks. The "Agent" level includes also compounds available via procurement agents, such as Molport or eMolecules. At the other extreme, "Boutique" includes compounds that are likely to be expensive, require longer to synthesize, or often, both of these. The "Annotated" level also includes compounds that are in catalogs such as DrugBank or HMDB or ChEMBL, but may not be for sale. The filtering is by default progressive, thus the Annotated level also includes higher purchasability levels. To select explicitly only compounds that are at a particular level, toggle into "Exclusive" mode in the purchasability popup.
pH specifies the relevant pH at which 3D models are desired, and thus only applies to tranche selections in 3D. By default, Ref (pH 7.4) and Mid (other representations near physiological pH) are selected. Models at higher or lower pH may be explicitly included using the popup.
Charge specifies the net molecular charge on the molecule, and thus only applies to tranche selections in 3D. By default, all charge ranges are allowed. These may be individually toggled on or off as may be required by the project.
Each tranche may be selected or deselected by clicking on it. When on, the number of molecules contributed to the total set is summed horizonally to give totals by logP range and vertically to give totals by molecular weight range. Each range is up to and including that value. Thus the 425 column in molecular weight includes molecules having molecular weight greater than 400 and less than or equal to 425.
Popular choices of physical property subsets are available from the Presets dropdown menu. Among these, the most popular are "lead like" and "fragment like", which correspond to current opinion in the field. We take up each of these choices in turn.
- All or None - these choices allow all tranches to be selected or deselected.
- Lead-like. This is the most popular choice for projects seeking ligands that can be detected spectrophotometrically, as in an HTS assay.
- Fragment-like. This is the most popular choice for projects seeking ligands that can be detected biophysically, such as with SPR, NMR or X-ray crystallography.
- Drug-like. This a popular notion introduced by Lipinski in 1997, in which was observed that orally bioavailable drugs tend to follow the "Rule of fives", i.e. mwt < 500, logP < 5, H-bond donors <= 5, H-bond acceptors <= 2 * 5. In fact, using these criteria still includes many compounds that make poor initial hits, because they are too big, too greasy, and too complex. And even if they do hit, the have a very low ceiling for hit-to-lead optimization, because such optimization usually involves adding mass.
Some projects call for something just a little different than one of these standard subsets. Of course, the user may select arbitrary tranches by clicking on them, but a few additional choices have also been provided.
- Flagments - a little bit bigger than fragments, a little bit smaller than leads.
- Lugs - a little bit bigger than leads, a bit smaller than drugs
- big-n-greasy - as unlikely as it may seem, there are projects where only big greasy monsters will do the trick. Thus we provide them.
- Shards - smaller than fragments, these compounds fit model system sites, such as L99A, M102Q, CCP W191G and so on.
- Goldilocks - my personal favorite. There are like like leads in molecular weight, but have a tighter range in LogP, based on the observation that compounds that are too greasy do not dissolve, and compounds that are too polar tend not to come out of solution. Vendors agree - the Goldilocks subset has nearly 3/4 ( 31 of 46 million ) of the leads using other default choices.
When all selections have been made, the user clicks on Download. This does not download the files themselves, but a script with which they may be downloaded. Again, their are choices and defaults, which we take up in turn.