ZINC22:Fine Tranching with RDKit using Heavy Atom Count and LogP: Difference between revisions

From DISI
Jump to navigation Jump to search
(Created page with "Written by Jennifer Young on April 14, 2020 =Introduction= These scripts perform fine tranching with RDKit to compute the heavy atom count and logP for each molecule and put...")
 
No edit summary
Line 12: Line 12:


     source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env
     source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env
==Run Python script with the desired arguments==
The smiles file and batch size are command line arguments.  If you choose a batch size of 10,000, the output file will be written to after each batch of 10,000 molecules is processed.
    python /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/rdkit_hlogp_batch.py <smiles_file> <batch_size>
==Sample Bash script for running on many smiles files==
If your smiles file is large, split into chunks of 1 million (or whatever your desired size). 
    split -l 1000000 <your_smiles>
Then run
    /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/runall.sh
Change the x?? to the desired pattern and change the batch size to the desired value.
    #!/usr/bin/env bash
    for i in x??;
    do
      source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env
      python /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/rdkit_hlogp_batch.py $i 10000
    done

Revision as of 20:59, 14 April 2020

Written by Jennifer Young on April 14, 2020

Introduction

These scripts perform fine tranching with RDKit to compute the heavy atom count and logP for each molecule and put it in a bucket of the form HxxPyyy for positive valued logp (i.e. 0 < logp) and HxxMyyy for negative valued logp (i.e. logp < 0). The scripts are located in

   /nfs/home/jyoung/code/fine_tranche_hlogp_scripts

How to run

Create and/or source a conda environment for RDKit

If you are using our cluster, there is already a conda environment with RDKit available and you just need to source it using the following command. You need to use bash.

    bash 
   source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env

Run Python script with the desired arguments

The smiles file and batch size are command line arguments. If you choose a batch size of 10,000, the output file will be written to after each batch of 10,000 molecules is processed.

   python /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/rdkit_hlogp_batch.py <smiles_file> <batch_size>

Sample Bash script for running on many smiles files

If your smiles file is large, split into chunks of 1 million (or whatever your desired size).

   split -l 1000000 <your_smiles>

Then run

   /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/runall.sh 

Change the x?? to the desired pattern and change the batch size to the desired value.

   #!/usr/bin/env bash
   for i in x??;
   do
      source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env
      python /nfs/home/jyoung/code/fine_tranche_hlogp_scripts/rdkit_hlogp_batch.py $i 10000
   done