Andrii's notes on SynthI
Parent page: SynthI
Preparing analogs with SynthI
Working with https://cartblanche22.docking.org/searchZinc/ZINCoT000006Aq87, bash needed. Prepare .smi file with the list of SMILES (and names) of compds to prepare analogs for.
First, we need to fragment our compounds:
python /nfs/soft2/SynthI//SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --nCores 5 --MaxNumberOfStages 1
In the file ligand.smi_out you will get a list of synthons and reactions that are applied to the molecule:
C[C@@H](NCc1cccc(-n2cccn2)c1)c1csc2ccccc12 ZINCoT000006Aq87 c1cn[nH:20]c1.c1cc(C[NH2:20])c[cH:21]c1.C[CH2:10]c1csc2ccccc12 R3.1_0|R5.2_0 3 0 AvailableSynthons: NotAvailableSynthons:C[CH2:10]c1csc2ccccc12|c1cc(C[NH2:20])c[cH:21]c1|c1cn[nH:20]c1
As the synthons generated contain molecular fragments, you will have to manually cap the BBs according to the reactions provided. Then search for similar BBs in SmallWorld. For each of the found lists of BBs do:
awk '{print $1 " " $2}' thioph-Cl.tsv | grep -v alignment > thioph-Cl.smi
Then cat into one file and prepare synthons from the BBs found.
python /nfs/soft2/SynthI/SynthI_BBsBulkClassificationAndSynthonization.py -i bb_analogs.smi
Leave only SMILES and names
awk '{print $1 " " $NF}' bb_analogs.smi_Synthmode.smi > bb_analogs_synth.smi
A. Enumeration based on all found BBs
Directory under "-oD" will contain eitherFinalOut_allEnumeratedCompounds_DuplicatesCanBePresent.smi or AnalogsForMol1.smi file with list of SMILES for generated compds.
Different from analog generation, the output of enumeration does not contain the reactions and synthons used for compod. generation.
Using all available reactions:
mkdr ENUMERATED python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200
Morgan2 Tc of obtained compds to the parent.
Using the same reactions that were used for initial fragmentation:
mkdir ENUMERATED-R3-R5 python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i bb_analogs_synth.smi --nCores 10 -oD ENUMERATED-R3-R5/ --MaxNumberOfStages 5 --enumerationMode --MWupperTh 460 --MWlowerTh 200 --fragmentationMode include_only --reactionsToWorkWith "R3, R5"
Morgan2 Tc of obtained compds to the parent.
B. Analogs from a synthon library
python /nfs/soft2/SynthI/SynthI_BulkFragmentationEnumerationAndAnaloguesDesign.py -i ligand.smi --SynthLibrary bb_analogs_synth.smi --simTh 0.5 --analoguesLibGen --nCores 10 -oD ANALOGS --MaxNumberOfStages 5 --desiredNumberOfNewMols 1000 --enumerationMode --MWupperTh 460 --MWlowerTh 200
Morgan2 Tc of obtained compds to the parent.
Analog generation doesn't seem to use similarity threshold value. Usage of large synthon library may be useful for analog generation, but speed needs to be tested.
TODO
- Work with 2 synthon files, instead of one.
- Output identifiers of enumerated compds: RXN_synt1_synt2
- Processing the end of synthon file