IFP Filtering on Wynton

From DISI
Jump to navigation Jump to search

Seth Vigneron Oct 2024

IFP filtering on wynton proceeds in almost the exact same way as on our gimel cluster. Note the conda environment sourced uses the same older versions of python and LUNA as JK's original scripts on gimel, so wynton is not using any updated versions of any software and still runs the same IFP protocol.

1. Run getposes script following your screen to get a bunch of large mol2 files that you want to run IFP on.

2. Make a dirlist of your split getposes mol2 files for use later

  ls $PWD/poses_extract_for_getposes_parallel_*mol2 > mol2_dirlist_ifp

3. Prep your IFP directory

  mkidr ifp
  cd ifp
  cp -r /wynton/group/bks/work/shared/svigneron/IFP_wynton_scrips/scripts .
  vim scripts/ifp_interactions.py 

add in your desired interaction filters to the filters list. An example for formatting the residue name/number and interaction:

  filters = [['Hydrogen bond','GLY-333'],['Hydrogen bond','ALA-353'],['Hydrogen bond','TYR-368']]

continue on prepping your working directory

  mkdir working
  cd working
  cp /path/to/rec.crg.pdb .
  vim rec.crg.pdb

Be sure to change any HIE or HID to HIS, and revert back any names for tarted residues as those will not be recognized by LUNA. Also if you are wanting to filter against interactions with water molecules, in the pdb file lines for the water molecules change ATOM to HETATM. The name can be HOH or WAT.

4. Make split filter-XXX.mol2 files of 2000 molecules each to run IFP in parallel on. You can go lower than 2000 if you like, but no need to go higher

  while read line; do python ../scripts/lc_blazing_fast_separate_mol2_into_smaller_files_called_filter-XXX.py $line 2000 ; done<../../mol2_dirlist_ifp
  
  ls *.mol2 > dirlist


5. Submit IFP to the cluster

  csh ../scripts/submit.csh /path/to/working /path/to/scripts <name-of-receptor.pdb-without-.pdb-at-the-end>

6. Check if all runs finished

  ls -d --color=never [0-9]* > dirlist_combine
  python ../scripts/check_finished_notfinished_ifp.py dirlist_combine

If there are any jobs that failed to run, they will be put into NOT-FINISHED_dirlist to re-run these:

  csh ../scripts/resubmit.csh /path/to/working /path/to/scripts <name-of-receptor.pdb-without-.pdb-at-the-end> NOT-FINISHED_dirlist

7. Combine Results

  python ../scripts/combine_ifp.py dirlist_combine combined

You can combine all of your smiles from your IFP run using the below script. This makes the file combined.zincid.smiles

  python ../scripts/concatinate_smiles.py dirlist_combine

8. Collect Filtered Molecules the combined.interactions.csv file lists out each molecules interactions from

  $1 : ZINC ID
  $2 : # of H-bond donors
  $3 : # of H-bond acceptors
  $4 : # of unsatisfied H-bond donors
  $5 : # of unsatisfied H-bond acceptors

and starting with $6 and onwards are the additional interactions specified in ifp_interactions.py if you need a reminder of all the filters and their order in the combined.interactions.csv, you can find it here by doing: head -1 000/000.interactions.csv

Typical protocol for the lab is to remove any compound with unsatisfied hbond donors and more than 5 unsatisfied hbond acceptors

  awk -F "," '$4==0 && $5<=5 && $6==1' combined.interactions.csv > ifp_filtered.interactions.csv

where $6 and so on are your additional filters

  awk -F "," '{print $1}' ifp_filtered.interactions.csv > ifp_filtered.interactions.zincid