Zoey's way of filtering LSD

From DISI
Jump to navigation Jump to search

This is my current method (2024-09) of filtering LSD results, and the culmination of discussion with several other lab members to determine the "best" way to filter LSD results. Procedures below are adapted from various other lab sources.

Step 0: Source the environment and copy over relevant files

Source JK's environment with the following commands, accessible on gimel:

csh source /mnt/nfs/home/jklyu/anaconda3/etc/profile.d/conda.csh
conda activate bioinfo-env2
if ! ( $?PYTHONPATH ) setenv PYTHONPATH ""
setenv PYTHONPATH $PYTHONPATH\:/mnt/nfs/ex5/work/jklyu/IFP/package/ifp/scripts\:

Subsequently, you can source my collection of modified scripts by running the yet-to-be-written bash script that builds the requisite directory structure and copies over all necessary scripts.

script here

Step 1: Interaction fingerprinting

Copy your initially extracted poses to prep_poses. These poses will need to be split so we can run them in parallel, so run the requisite script. More is better. Zoey will re-write the splitting script at some point because there's an arbitrary cap at 156 files at most.

Once your mol2 files are split, reformat and relocate them to your working directory. Place your pdb file in the working directory. Run from the parent directory.

cd ..
csh /scripts/submit.csh [rec name]

Note: Need to tweak submit script so the above commands actually work.

Step 2: Novelty filtering

I've been making use of Olivier's method for novelty, which can be found here.

It has been recommended to restrict the results on ChEMBL for a target protein to just those compounds annotated with binding (EC50, IC50, etc) data. Additionally, it can be worthwhile with targets we have previously screened internally to include a second novelty filter for previously identified/purchased compounds.


Step 3: Best first clustering

Note that the input SMILES need to be in rank-order.