SynthI: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
(Added the section on analog generation)
Line 49: Line 49:


Detailed guide is [https://github.com/Laboratoire-de-Chemoinformatique/SynthI here]
Detailed guide is [https://github.com/Laboratoire-de-Chemoinformatique/SynthI here]
==== Preparing analogs with SynthI ====
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.
First, we need to fragment our compounds:
  python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:
  C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12  ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:
  awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi
Then cat into one file and prepare synthons from the BBs found.
  python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi
Leave only SMILES and names
  awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi
===== A. Enumeration based on all found BBs =====
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds
Using all available reactions:
  mkdr ENUMERATED
  python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200
Morgan2 Tc of obtained compds to the parent.
[[Image:Tanimoto.png|300px]]
Using the same reactions that were used for initial fragmentation:
  mkdir ENUMERATED-R3-R5
  python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"
Morgan2 Tc of obtained compds to the parent.
[[Image:Tanimoto-synthi-enum-r3-r5.png|300px]]
=====  B. Analogs from a synthon library =====
  python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200 
Morgan2 Tc of obtained compds to the parent.
[[Image:Tanimoto-synthi-analogs.png|300px]]
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.


[[Category: Software]]
[[Category: Software]]

Revision as of 08:47, 13 March 2022

SynthI is a open-source tool for synthons-based library design written by Laboratoire-de-Chemoinformatique from University of Strasbourg

Setup

Anaconda3 and a copy of SynthI are already install on /nfs/soft2/. To activate:

$ source /nfs/soft2/anaconda3/bin/activate SynthI-env

Install Anaconda3 on your computer

$ wget https://repo.anaconda.com/archive/Anaconda3-2021.11-Linux-x86_64.sh 
$ sh Anaconda3-2021.11-Linux-x86_64.sh
  Welcome to Anaconda3 5.3.1
  In order to continue the installation process, please review the license
  agreement.
  Please, press ENTER to continue
  .
  .
  .
  Anaconda3 will now be installed into this location:
  /home/khtang/anaconda3
  [/home/khtang/anaconda3] >>> /nfs/home/khtang/anaconda3
  PREFIX=/mnt/nfs/home/khtang/anaconda3
  .
  .
  .
  Do you wish the installer to initialize Anaconda3
  in your /nfs/home/khtang/.bashrc ? [yes|no]
  [no] >>> no
  Do you wish to proceed with the installation of Microsoft VSCode? [yes|no]
  >>> no
$ source /<path to anaconda3>/bin/activate

Download and setup SynthI

Install dependencies

If you are on the cluster, SynthI is installed on /nfs/soft2/SynthI

$ source /nfs/soft2/anaconda3/bin/activate SynthI-env
$ conda deactivate //to exit environment

Install stand-alone

$ git clone https://github.com/Laboratoire-de-Chemoinformatique/SynthI.git
$ cd SynthI
$ source /<path to anaconda3>/bin/activate
(base) $ conda env create -f SynthI_environment.yml  -p /home/user/anaconda3/envs/synthI_env
(base) $ conda activate /home/user/anaconda3/envs/synthI_env
(synthI_env) $

Usage

The scripts are to be use as a Python library inside of customized script as in you can package it up using setuptools and install the package into your virtual environment or copy into the directory where your script is and import it as a module.

Please note: Prior to BBs synthonization the SMILES should be preprocessed and conterions and solvents should be removed. SynthI-BBs consider every molecule while processing mixture SMILES and for each of them synthons will be generated if possible, therefore take care of them before synthonization.

Detailed guide is here

Preparing analogs with SynthI

Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.


First, we need to fragment our compounds:

  python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1

In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:

  C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12  ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1

As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:

  awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi

Then cat into one file and prepare synthons from the BBs found.

  python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi

Leave only SMILES and names

  awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi


A. Enumeration based on all found BBs

Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds

Using all available reactions:

  mkdr ENUMERATED
 python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200

Morgan2 Tc of obtained compds to the parent. Tanimoto.png

Using the same reactions that were used for initial fragmentation:

  mkdir ENUMERATED-R3-R5
  python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"

Morgan2 Tc of obtained compds to the parent. Tanimoto-synthi-enum-r3-r5.png


B. Analogs from a synthon library
  python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200  

Morgan2 Tc of obtained compds to the parent. Tanimoto-synthi-analogs.png

Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.