Zoey's way of filtering LSD
This is my current method (2024-09) of filtering LSD results, and the culmination of discussion with several other lab members to determine the "best" way to filter LSD results. Procedures below are adapted from various other lab sources.
Step 0: Source the environment and copy over relevant files
Source JK's environment with the following commands, accessible on gimel:
csh source /mnt/nfs/home/jklyu/anaconda3/etc/profile.d/conda.csh conda activate bioinfo-env2 if ! ( $?PYTHONPATH ) setenv PYTHONPATH "" setenv PYTHONPATH $PYTHONPATH\:/mnt/nfs/ex5/work/jklyu/IFP/package/ifp/scripts\:
Subsequently, you can source my collection of modified scripts by running the yet-to-be-written bash script that builds the requisite directory structure and copies over all necessary scripts.
script here
Step 1: Interaction fingerprinting
Copy your initially extracted poses to prep_poses. These poses will need to be split so we can run them in parallel, so run the requisite script. More is better. Zoey will re-write the splitting script at some point because there's an arbitrary cap at 156 files at most.
Once your mol2 files are split, reformat and relocate them to your working directory. Place your pdb file in the working directory. Run from the parent directory.
cd .. csh /scripts/submit.csh [rec name]
Note: Need to tweak submit script so the above commands actually work.
Step 2: Novelty filtering
I've been making use of Olivier's method for novelty, which can be found here.
It has been recommended to restrict the results on ChEMBL for a target protein to just those compounds annotated with binding (EC50, IC50, etc) data. Additionally, it can be worthwhile with targets we have previously screened internally to include a second novelty filter for previously identified/purchased compounds.
Step 3: Best first clustering
Note that the input SMILES need to be in rank-order.