Difference between revisions of "Fine Tranching with RDKit using Heavy Atom Count and LogP"

From DISI
Jump to: navigation, search
m (Sample Bash script for running on many smiles files: asd)
 
(2 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
Written by Jennifer Young on April 14, 2020  
 
Written by Jennifer Young on April 14, 2020  
  
=Introduction=
+
=Introduction https://github.com/docking-org/ZINC21-Tools=
These scripts perform fine tranching with RDKit to compute the heavy atom count and logP for each molecule and put it in a bucket of the form HxxPyyy for positive valued logp (i.e. 0 < logp) and HxxMyyy for negative valued logp (i.e. logp < 0).  The scripts are located in
+
These scripts perform fine tranching with RDKit to compute the heavy atom count and logP for each molecule and put it in a bucket of the form HxxPyyy for positive valued logp (i.e. 0 < logp) and HxxMyyy for negative valued logp (i.e. logp < 0).   
    /nfs/home/jyoung/code/fine_tranche_hlogp_scripts
+
 
 +
See github repo https://github.com/docking-org/ZINC21-Tools
  
 
=How to run=
 
=How to run=
Line 16: Line 17:
 
Read the section : How to install RDKit with Conda.  Once you do  
 
Read the section : How to install RDKit with Conda.  Once you do  
 
     conda activate my-rdkit-env
 
     conda activate my-rdkit-env
 +
 +
    conda install -c conda-forge tqdm
  
 
You are ready to run the Python script.
 
You are ready to run the Python script.
  
 
==Run Python script with the desired arguments==
 
==Run Python script with the desired arguments==
The smiles file and batch size are command line arguments.  If you choose a batch size of 10,000, the output file will be written to after each batch of 10,000 molecules is processed.
 
 
The input smiles file should have the following 2 columns
 
The input smiles file should have the following 2 columns
 
*smiles
 
*smiles
 
*ID
 
*ID
  
See python script http://wiki.docking.org/index.php/Rdkit_hlogp_batch.py for reference
+
     python rdkit_hlogp_batch_mp.py <smiles>
 
+
     python /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/rdkit_hlogp_batch.py <smiles_file> <batch_size>
+
  
 
The output file will be a file with the name <smiles_file>_hlogp and will have the following 3 columns
 
The output file will be a file with the name <smiles_file>_hlogp and will have the following 3 columns
Line 33: Line 33:
 
* original ID
 
* original ID
 
* HxxPyyy HxxMyyy
 
* HxxPyyy HxxMyyy
 
=Sample Bash script for running on many smiles files=
 
If your smiles file is large, split into chunks of 1 million (or whatever your desired size). 
 
    split -l 1000000 <your_smiles>
 
 
Then run the following script which is reproduced below.
 
    /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/runall.sh
 
 
Change the x?? to the desired pattern and change the batch size to the desired value.
 
 
    #!/usr/bin/env bash
 
    for i in x??;
 
    do
 
      source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env
 
      python /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/[[rdkit_hlogp_batch.py]] $i 10000
 
    done
 

Latest revision as of 15:29, 17 April 2020

Written by Jennifer Young on April 14, 2020

Introduction https://github.com/docking-org/ZINC21-Tools

These scripts perform fine tranching with RDKit to compute the heavy atom count and logP for each molecule and put it in a bucket of the form HxxPyyy for positive valued logp (i.e. 0 < logp) and HxxMyyy for negative valued logp (i.e. logp < 0).

See github repo https://github.com/docking-org/ZINC21-Tools

How to run

(If you are using our cluster) Source conda environment for RDKit

If you are using our cluster, there is already a conda environment with RDKit available and you just need to source it using the following command. You need to use bash.

    bash 
   source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env

If you need to create a conda environment, follow the instructions at https://rdkit.org/docs/Install.html

Read the section : How to install RDKit with Conda. Once you do

   conda activate my-rdkit-env
   conda install -c conda-forge tqdm

You are ready to run the Python script.

Run Python script with the desired arguments

The input smiles file should have the following 2 columns

  • smiles
  • ID
   python rdkit_hlogp_batch_mp.py <smiles>

The output file will be a file with the name <smiles_file>_hlogp and will have the following 3 columns

  • original smiles
  • original ID
  • HxxPyyy HxxMyyy