SynthI: Difference between revisions
No edit summary |
(Added the section on analog generation) |
||
Line 49: | Line 49: | ||
Detailed guide is [https://github.com/Laboratoire-de-Chemoinformatique/SynthI here] | Detailed guide is [https://github.com/Laboratoire-de-Chemoinformatique/SynthI here] | ||
==== Preparing analogs with SynthI ==== | |||
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for. | |||
First, we need to fragment our compounds: | |||
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1 | |||
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule: | |||
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1 | |||
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do: | |||
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi | |||
Then cat into one file and prepare synthons from the BBs found. | |||
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi | |||
Leave only SMILES and names | |||
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi | |||
===== A. Enumeration based on all found BBs ===== | |||
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds | |||
Using all available reactions: | |||
mkdr ENUMERATED | |||
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 | |||
Morgan2 Tc of obtained compds to the parent. | |||
[[Image:Tanimoto.png|300px]] | |||
Using the same reactions that were used for initial fragmentation: | |||
mkdir ENUMERATED-R3-R5 | |||
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5" | |||
Morgan2 Tc of obtained compds to the parent. | |||
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]] | |||
===== B. Analogs from a synthon library ===== | |||
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 | |||
Morgan2 Tc of obtained compds to the parent. | |||
[[Image:Tanimoto-synthi-analogs.png|300px]] | |||
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested. | |||
[[Category: Software]] | [[Category: Software]] |
Revision as of 08:47, 13 March 2022
SynthI is a open-source tool for synthons-based library design written by Laboratoire-de-Chemoinformatique from University of Strasbourg
Setup
Anaconda3 and a copy of SynthI are already install on /nfs/soft2/. To activate:
$ source /nfs/soft2/anaconda3/bin/activate SynthI-env
Install Anaconda3 on your computer
$ wget https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh
$ sh Anaconda3-2021.11-Linux-x86_64.sh Welcome to Anaconda3 5.3.1 In order to continue the installation process, please review the license agreement. Please, press ENTER to continue . . . Anaconda3 will now be installed into this location: /home/khtang/anaconda3 [/home/khtang/anaconda3] >>> /nfs/home/khtang/anaconda3 PREFIX=/mnt/nfs/home/khtang/anaconda3 . . . Do you wish the installer to initialize Anaconda3 in your /nfs/home/khtang/.bashrc ? [yes|no] [no] >>> no Do you wish to proceed with the installation of Microsoft VSCode? [yes|no] >>> no
$ source /<path to anaconda3>/bin/activate
Download and setup SynthI
Install dependencies
If you are on the cluster, SynthI is installed on /nfs/soft2/SynthI
$ source /nfs/soft2/anaconda3/bin/activate SynthI-env $ conda deactivate //to exit environment
Install stand-alone
$ git clone https://github.com/Laboratoire-de-Chemoinformatique/SynthI.git $ cd SynthI $ source /<path to anaconda3>/bin/activate (base) $ conda env create -f SynthI_environment.yml -p /home/user/anaconda3/envs/synthI_env (base) $ conda activate /home/user/anaconda3/envs/synthI_env (synthI_env) $
Usage
The scripts are to be use as a Python library inside of customized script as in you can package it up using setuptools and install the package into your virtual environment or copy into the directory where your script is and import it as a module.
Please note: Prior to BBs synthonization the SMILES should be preprocessed and conterions and solvents should be removed. SynthI-BBs consider every molecule while processing mixture SMILES and for each of them synthons will be generated if possible, therefore take care of them before synthonization.
Detailed guide is here
Preparing analogs with SynthI
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.
First, we need to fragment our compounds:
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi
Then cat into one file and prepare synthons from the BBs found.
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi
Leave only SMILES and names
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi
A. Enumeration based on all found BBs
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds
Using all available reactions:
mkdr ENUMERATED python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200
Morgan2 Tc of obtained compds to the parent.
Using the same reactions that were used for initial fragmentation:
mkdir ENUMERATED-R3-R5 python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"
Morgan2 Tc of obtained compds to the parent.
B. Analogs from a synthon library
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200
Morgan2 Tc of obtained compds to the parent.
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.