Synthesia is a command-line tool that uses an entire retrosynthetic route as a guide pathway to generate optimized structural analogues of a lead compound without compromising the synthesizability of the structure. The users has the ability to guide the structural modifications in a desired direction by specifying structural constraints.
Original publication:
To obtain the license, you need to register and get your account approved. Then login to the website, click on Synthesia and "Download the license file". Your license key is inside the file, it looks like AAAAAAAliFQAAAAU2eM8ZjTTELGD3LzxBgt3/1DGaW4=
. Copy the license key from the file, download and unpack the program and run the command:
./synthesia --license <your_license_here>
To run the program, you need:
- a retrosynthetic tree and
The tool returns analogs of a target molecule synthesizable by the given route from given BBs. The analogs may be filtered by 29 different parameters (see SI or file):
- Extended-Connectivity Fingerprints (ECFP)
- Functional-Class Fingerprints (FCFP)
- Connected Subgraph Fingerprints (CSFP)
- Largest Ring
- Largest Ringsystem
- Molecular Weight
- Number of Hydrogen-Bond Acceptors
- Number of Anions
- Number of Aromatic Atoms
- Number of Aromatic Rings
- Number of Aromatic Ringsystems
- Number of Cations
- Number of Hydrogen-Bond Donors
- Number of Halogens
- Number of Non-Hydrogen Atoms
- Number of Hetero Atoms
- Number of Hydrophobic Points
- Number of Inorganic Atoms
- Number of Lipinski Donors
- Number of Nitrogens and Oxygens
- Number of Non-Hydrogen Bonds
- Number of Rings
- Number of Ringsystems
- Number of Rotatable Bonds
- LogP-Value
- Total Charge
- Topological Polar Surface Area (TPSA)
- Volume
- Matching SMARTSS3 pattern. Either inclusion or exclusion
Retrosynthetic tree
This is a .json file that describes all steps needed to synthesize the molecule in question and starting reagents. The steps are encoded in SMARTS. Each reaction node should contain reaction SMARTS + SMILES of the product. The tree for granisetron synthesis via amidation is shown below.
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",
"is_chemical": true,
"smiles": "CN1C2CCCC1CC(C2)NC(=O)c1nn(C)c2ccccc12",
"is_reaction": true,
"smartsPattern": "[#7:1].[#8]-[#6:2](=O)>>[#7:1]-[#6:2](=O)",
"smiles": "Cn1nc(C(O)=O)c2ccccc12",
"is_chemical": true,
"children": []
"smiles": "CN1C2CCCC1CC(N)C2",
"is_chemical": true,
"children": []
How to create this tree?
- If only one stage needed -- just write manually.
- The authors used open-source ML tool AiZynthFinder:,
- Reaxys Retrosynthesis tool (was not able to find a root for granisetron though, seems to use only published procedures)
- Sci-Finder Retrosynthesis. Exports results in .pdf only, but at least you can copy compound SMILES from the Retrosynthesis Plan, just click on the structure and select "Substance Detail".
- IBM RXN Based on machine-extracted patent reactions. You have to manually select reactions for each step.
- Spaya AI
In any case, except for one stage synthesis, I would recommend consulting a synthetic chemist before creating analogs.
Configuration file
All additional settings of Synthesia can be specified in a configuration file. This file is optional and the user does not have to use it. If both the configuration file as well as command line parameters are used to define parameters, the settings parsed via command line overwrite settings defined in the configuration file. The configuration file has to be in valid standard JSON format. An example configuration file is bundled with Synthesia.
So far I've been using only command line parameters.
./synthesia --inputStructures ../../Enamine-BB/2022-03_Chemspace_Building_Blocks_noRU_SMILES.smiles --retroSynTree amide_tree.json --output amide_out-ecfp.json --threads 4 --verbosity 5 --allLeaves --useECFP 2 tanimoto 0.6 1.0
-- a library of BBs
-- self-explanatory
-- output .json file
-- Number of threads used for parallelization.
-- very important: without it you will only get suitable BBs and not final structures. The README says: Set this parameter to true if all chemical leaf nodes should be open for exchange. Either this parameter or the option allChemicals must be set or the nodeId parameter must be specified.
-- filter analogs by ECFP. 4 parameter values are expected: <Integer> <String> <Integer> <Integer> The first number equals the appended number of the FCFP. The second string parameters specifies the similarity measure method for a fingerprint comparison. Options are 'tanimoto', 'cosine', 'hamming', 'euclidean', 'dice.' The third number specifies the minimum threshold value for the similarity fingerprint comparison and the fourth number specifies the maximum threshold value.
This command executed on Gimel returns 511 analogs of granisetron in 1'56" from 436K BBs.
Pros of Synthesia — flexibility, multistage reactions of virtually any complexity and number of stages, fine-tuning of analog properties (No. of aromatic rings, halogens, cations, logP…).
Cons — closed sources, licensing on a per-year basis, steep learning curve, need to create retrosynthetic trees for each compound.
My resume at the moment — may be interesting to look into given access to a synthetic chemistry group.