ZINC22:Fine Tranching with RDKit using Heavy Atom Count and LogP

From DISI
Revision as of 18:38, 14 March 2022 by Khtang (talk | contribs)
Jump to navigation Jump to search

Written by Jennifer Young on April 14, 2020. Updated by Khanh Tang on March 14, 2022

Introduction https://github.com/docking-org/ZINC21-Tools

These scripts perform fine tranching with RDKit to compute the heavy atom count and logP for each molecule and put it in a bucket of the form HxxPyyy for positive valued logp (i.e. 0 < logp) and HxxMyyy for negative valued logp (i.e. logp < 0).

See github repo https://github.com/docking-org/ZINC21-Tools

How to run

(If you are using our cluster) Source conda environment for RDKit

If you are using our cluster, there is already a conda environment with RDKit available and you just need to source it using the following command. You need to use bash.

    bash 
   source /mnt/nfs/home/devtest/anaconda3/bin/activate my-rdkit-env

If you need to create a conda environment, follow the instructions at https://rdkit.org/docs/Install.html

Read the section : How to install RDKit with Conda. Once you do

   conda activate my-rdkit-env
   conda install -c conda-forge tqdm

You are ready to run the Python script.

Run Python script with the desired arguments

The input smiles file should have the following 2 columns

  • smiles
  • ID
   python rdkit_hlogp_batch_mp_2.py <smiles>

The output file will be a file with the name <smiles_file>_hlogp and will have the following 3 columns

  • original smiles
  • original ID
  • HxxPyyy HxxMyyy