Synthesia

From DISI
Jump to navigation Jump to search

Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.

Original publication: https://pubs.acs.org/doi/10.1021/acs.jcim.2c00246

Website: https://software.zbh.uni-hamburg.de/customers/tools

Installation

To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=. Copy the license key from the file, download and unpack the program and run the command:

./synthesia --license <your_license_here>

My installation is in /mnt/nfs/exa/work/ak87/UCSF/SynthI/SYNTHESIA/synthesia_1.0.0

Running

To run the program, you need:

  1. a retrosynthetic tree and
  2. a library of building blocks ("SMILES Name", no preprocessing needed).

The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or README.md file):

  1. Extended-Connectivity Fingerprints (ECFP)
  2. Functional-Class Fingerprints (FCFP)
  3. Connected Subgraph Fingerprints (CSFP)
  4. Largest Ring
  5. Largest Ringsystem
  6. Molecular Weight
  7. Number of Hydrogen-Bond Acceptors
  8. Number of Anions
  9. Number of Aromatic Atoms
  10. Number of Aromatic Rings
  11. Number of Aromatic Ringsystems
  12. Number of Cations
  13. Number of Hydrogen-Bond Donors
  14. Number of Halogens
  15. Number of Non-Hydrogen Atoms
  16. Number of Hetero Atoms
  17. Number of Hydrophobic Points
  18. Number of Inorganic Atoms
  19. Number of Lipinski Donors
  20. Number of Nitrogens and Oxygens
  21. Number of Non-Hydrogen Bonds
  22. Number of Rings
  23. Number of Ringsystems
  24. Number of Rotatable Bonds
  25. LogP-Value
  26. Total Charge
  27. Topological Polar Surface Area (TPSA)
  28. Volume
  29. Matching SMARTSS3 pattern. Either inclusion or exclusion

Retrosynthetic tree

This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for granisetron synthesis via amidation is shown below.

{
  "smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",
  "is_chemical": true,
  "children":
  [
    {
      "smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",
      "is_reaction": true,
      "smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",
      "children":
      [
        {
          "smiles": "Cn1nc(C(O)=O)c2ccccc12",
          "is_chemical": true,
          "children": []
        },
        {
          "smiles": "CN1C2CCCC1CC(N)C2",
          "is_chemical": true,
          "children": []
        }
      ]
    }
  ]
}

How to create this tree?

In any case, except for one stage synthesis, I would recommend consulting a synthetic chemist before creating analogs.

Configuration file

All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.

So far I've been using only command line parameters.

Running

./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0

--inputStructures -- a library of BBs

--retroSynTree -- self-explanatory

--output -- output .json file

--threads -- Number of threads used for parallelization.

--allLeaves -- very important: without it you will only get suitable BBs and not final structures. The README says: Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified.

--useECFP -- filter analogs by ECFP. 4 parameter values are expected: <Integer> <String> <Integer> <Integer> The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.

This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.

Conclusion

Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).

Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.

My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.