IFP Filtering on Wynton
Seth Vigneron Oct 2024
IFP filtering on wynton proceeds in almost the exact same way as on our gimel cluster. Note the conda environment sourced uses the same older versions of python and LUNA as JK's original scripts on gimel, so wynton is not using any updated versions of any software and still runs the same IFP protocol.
1. Run getposes script following your screen to get a bunch of large mol2 files that you want to run IFP on.
2. Make a dirlist of your split getposes mol2 files for use later
ls $PWD/poses_extract_for_getposes_parallel_*mol2 > mol2_dirlist_ifp
3. Prep your IFP directory
mkidr ifp cd ifp cp -r /wynton/group/bks/work/shared/svigneron/IFP_wynton_scrips/scripts .
vim scripts/ifp_interactions.py
add in your desired interaction filters to the filters list. An example for formatting the residue name/number and interaction:
filters = [['Hydrogen bond','GLY-333'],['Hydrogen bond','ALA-353'],['Hydrogen bond','TYR-368']]
continue on prepping your working directory
mkdir working cd working
cp /path/to/rec.crg.pdb . vim rec.crg.pdb
Be sure to change any HIE or HID to HIS, and revert back any names for tarted residues as those will not be recognized by LUNA. Also if you are wanting to filter against interactions with water molecules, in the pdb file lines for the water molecules change ATOM to HETATM. The name can be HOH or WAT.
4. Make split filter-XXX.mol2 files of 2000 molecules each to run IFP in parallel on. You can go lower than 2000 if you like, but no need to go higher
while read line; do python ../scripts/lc_blazing_fast_separate_mol2_into_smaller_files_called_filter-XXX.py $line 2000 ; done<../../mol2_dirlist_ifp ls *.mol2 > dirlist
5. Submit IFP to the cluster
csh ../scripts/submit.csh /path/to/working /path/to/scripts <name-of-receptor.pdb-without-.pdb-at-the-end>
6. Check if all runs finished
ls -d --color=never [0-9]* > dirlist_combine python ../scripts/check_finished_notfinished_ifp.py dirlist_combine
If there are any jobs that failed to run, they will be put into NOT-FINISHED_dirlist to re-run these:
csh ../scripts/resubmit.csh /path/to/working /path/to/scripts <name-of-receptor.pdb-without-.pdb-at-the-end> NOT-FINISHED_dirlist
7. Combine Results
python ../scripts/combine_ifp.py dirlist_combine combined
You can combine all of your smiles from your IFP run using the below script. This makes the file combined.zincid.smiles
python ../scripts/concatinate_smiles.py dirlist_combine
8. Collect Filtered Molecules the combined.interactions.csv file lists out each molecules interactions from
$1 : ZINC ID $2 : # of H-bond donors $3 : # of H-bond acceptors $4 : # of unsatisfied H-bond donors $5 : # of unsatisfied H-bond acceptors
and starting with $6 and onwards are the additional interactions specified in ifp_interactions.py if you need a reminder of all the filters and their order in the combined.interactions.csv, you can find it here by doing: head -1 000/000.interactions.csv
Typical protocol for the lab is to remove any compound with unsatisfied hbond donors and more than 5 unsatisfied hbond acceptors
awk -F "," '$4==0 && $5<=5 && $6==1' combined.interactions.csv > ifp_filtered.interactions.csv
where $6 and so on are your additional filters
awk -F "," '{print $1}' ifp_filtered.interactions.csv > ifp_filtered.interactions.zincid