Tutorial on running Molecular Dynamics for GIST grid generation with scripts

From DISI
Jump to navigation Jump to search

Tutorial written by Trent Balius (Jan. 9, 2017).

Here are more GIST related tutorials: DOCK_3.7_with_GIST_tutorials

Disclaimer

This is foremost for training Shoichet lab members. But we hope that the community finds this useful.

To use this tutorial you will need the following:

  • AMBER12 or higher. Here I used AMBER14.
  • AMBERTOOLS.
  • GPUs to run AMBER on. In our case we uses an SGE queuing system to manage these jobs.
  • It is helpful if you have DOCK3.7 the latest version.

Introduction

Grid inhomogeneous Solvation Theory (GIST) is a method for calculating thermodynamic properties of water on GIST lattice.

Here, we will use these grids for DOCKing (although, there are many other uses).

Set up environment

you can add the following to your environment file (.cshrc) for just issue them on the command line.

source python with bio_python package.

source /nfs/soft/python/envs/complete/latest/env.csh

set DOCKBASE

setenv DOCKBASE "/nfs/home/tbalius/zzz.github/DOCK"

set AMBERHOME

 setenv AMBERHOME /nfs/soft/amber/amber14

Put msms in your path

set path = ( /nfs/home/tbalius/zzz.programs/msms $path )

Prepare for AMBER

write the following script, named 001md.be_balsti_reduce.csh, using your favorate text editor (like vim).

#!/bin/csh 

# TEB and MF comments
# This script runs beblasti, creates dirs and splits pdb into rec and lig file for running amber MD.

# Setting up necessary environments
source /nfs/soft/python/envs/complete/latest/env.csh
setenv DOCKBASE "/nfs/home/tbalius/zzz.github/DOCK"
# setenv AMBERHOME /nfs/soft/amber/amber14
set path = ( /nfs/home/tbalius/zzz.programs/msms $path )

# list of PDBs
# 4NVA apo with waters, 4NVE ligand
#set list = `cat /nfs/work/users/tbalius/VDR/Enrichment/pdblist_rat `
set list = '4NVE 4NVA'         ### CHANGE THIS

#set mountdir = "/mnt/nfs/work/tbalius/Water_Project/run_DOCK3.7"
set mountdir = `pwd`

# loop over all PDBs 
foreach pdbname ( $list )

echo " ${pdbname} "

set workdir = ${mountdir}/MDrundir/prep/${pdbname}

# check if workdir exists
if ( -s $workdir ) then
   echo "$workdir exits"
   continue
endif

# if it exists do this:     rm -rf ${workdir}
  mkdir -p ${workdir}
  cd ${workdir}
  
  ln -s /nfs/home/tbalius/zzz.programs/msms/atmtypenumbers .
  #python ~/zzz.scripts/be_blasti.py --pdbcode $pdbname nocarbohydrate renumber | tee -a pdbinfo_using_biopython.log
  #>> to use e.g. buffer like MES without modifying the beblasti script, rename lig code in pdb file and use next line
  #>> to use pdbfile replace --pdbcode with --pdbfile (filename incl .pdb)
  #python /nfs/home/tbalius/zzz.scripts/be_blasti.py --pdbfile $pdbfile nocarbohydrate original_numbers | tee -a pdbinfo_using_biopython.log
  python /nfs/home/tbalius/zzz.scripts/be_blasti.py --pdbcode $pdbname nocarbohydrate original_numbers | tee -a pdbinfo_using_biopython.log
  
# extracts all waters into separate pdb file
  grep HOH  $workdir/${pdbname}_A.pdb > $workdir/water.pdb

  if !(-s rec.pdb) then
      echo "rec.pdb is not found"
  endif

# nomenclature clean-up.
  mv rec.pdb temp.pdb
  grep -v TER temp.pdb | grep -v END  > rec.pdb
  rm temp.pdb

  if (-s lig.pdb) then
     sed -e "s/HETATM/ATOM  /g" lig.pdb > xtal-lig.pdb
  else if (-s pep.pdb) then ## if no ligand and peptide
     sed -e "s/HETATM/ATOM  /g" pep.pdb > xtal-lig.pdb
  else
     echo "Warning: No ligand or peptid."
  endif

end # system

This script downloads PDB files from the web and breaks them up into ligand and receptor.

Next we align the receptors, ligands and waters into the same frame (002md.alignwithchimera_forTleap.csh):

#!/bin/csh 
## this script was written by trent balius in the Rizzo Group, 2011
## modified in the Shoichet Group, 2013-2015
# TEB, MF comments 2017

# This shell script will do the following:
# (1) align the ligand file onto hydrated apo reference
# (2) then use superposed ligand to id nearby binding site waters in apo structure
# (3) save nearby waters only to pdb

set mountdir = `pwd`

set workdir = $mountdir/MDrundir/prep/002md_align

rm -rf $workdir
mkdir -p $workdir
cd $workdir

# get the right files.      ### CHANGE THIS
set pdb1 = "4NVA"
set pdb2 = "4NVE"

set apo_ref = "../$pdb1/rec.pdb"
set rec_lig = "../$pdb2/rec.pdb"
set lig = "../$pdb2/xtal-lig.pdb" 
set wat = "../$pdb1/water.pdb" 

set chimerapath = "/nfs/soft/chimera/current/bin/chimera"

#write instruction file for chimera based alignment
cat << EOF > chimera.com
# template #0
open $apo_ref 
# rec #1
open $rec_lig
# xtal-lig
open $lig
# waters
open $wat

# move original to gist. it is harder to move the gist grids. 
mmaker #0 #1 
matrixcopy #1 #2
sel #3:HOH & #2 z<8
del #3 & ~sel
write format pdb  0 apo_ref.pdb
write format pdb  1 rec_lig_aligned.pdb
write format pdb  2 lig_aligned.pdb
write format mol2  2 lig_aligned.mol2
write format pdb  3 nearby_waters_aligned.pdb
EOF

${chimerapath} --nogui chimera.com > & chimera.com.out


Next we prepare the ligand-less receptor for molecular dynamics simulation with the following script ( 003md.tleap_reduce_HEME.csh ):

#!/bin/csh
# This script uses first reduce and then tleap to prepare a receptor for amber.
# The outputs are: parameter topology file (prm7) and a coordinate file (rst7) 
#
## it includes the option to include Xtal waters (currently commented out)
#
# Written by Trent balius
# TEB/ MF comments Feb2017

setenv AMBERHOME /nfs/soft/amber/amber14
setenv DOCKBASE "/nfs/home/tbalius/zzz.github/DOCK"
# CUDA for GPU
#setenv LD_LIBRARY_PATH ""
#setenv LD_LIBRARY_PATH "/usr/local/cuda-6.0/lib64/:$LD_LIBRARY_PATH"

set mountdir = `pwd`
set filedir = $mountdir/MDrundir/prep/002md_align
set workdir = $mountdir/MDrundir/prep/003md_tleap

# check if workdir exists
if ( -s $workdir ) then
   echo "$workdir exits"
   continue
endif
       
# if it exists do this:     rm -rf ${workdir}
mkdir -p ${workdir}
cd ${workdir}

#rm -f leap.log  tleap.rec.out tleap.rec.in

## uncomment the next line to include Xtal waters into simulation
#cp $filedir/nearby_waters_aligned.pdb wat.pdb
cp $filedir/apo_ref.pdb rec.pdb

# get HEME parameters
curl docking.org/~tbalius/code/waterpaper2017/parms/frcmod_teb_mf_mod.hemall > frcmod_teb_mf_mod.hemall
curl docking.org/~tbalius/code/waterpaper2017/parms/heme_all_plus1.in > heme_all_plus1.in

### 1st: run leap just to renumber residues.

# produces tleap input file -- renumbers rec
### draw bond between HIS and HEME with 5 coordinated iron.
##bond REC.175.NE2 REC.293.FE
cat << EOF >! tleap.rec.1.in 
set default PBradii mbondi2
# load the protein force field
source leaprc.ff14SB
loadamberparams frcmod_teb_mf_mod.hemall
loadamberprep heme_all_plus1.in
REC = loadpdb rec.pdb
saveamberparm REC rec.1.leap.prm7 rec.1.leap.rst7
quit
EOF

# runs tleap and converts parameter and coordinate restart file back into pdb file
$AMBERHOME/bin/tleap -s -f tleap.rec.1.in > ! tleap.rec.1.out
$AMBERHOME/bin/ambpdb -p rec.1.leap.prm7 < rec.1.leap.rst7 >! rec.1.leap.pdb 

# nomenclature clean-up
$DOCKBASE/proteins/Reduce/reduce -HIS -FLIPs rec.1.leap.pdb >! rec.nowat.reduce.pdb
sed -i 's/HETATM/ATOM  /g' rec.nowat.reduce.pdb 
# grep -v means "everything but"; last grep statement removes inconsistently named HEM hydrogens (for leap to be added back in)
grep "^ATOM  " rec.nowat.reduce.pdb | sed -e 's/   new//g' | sed 's/   flip//g' | sed 's/   std//g' | grep -v "OXT" | grep -v " 0......HEM" >! rec.nowat.reduce_clean.pdb


curl docking.org/~tbalius/code/waterpaper2017/scripts/replace_his_with_hie_hid_hip.py > replace_his_with_hie_hid_hip.py
curl docking.org/~tbalius/code/waterpaper2017/scripts/replace_cys_to_cyx.py > replace_cys_to_cyx.py
curl docking.org/~tbalius/code/waterpaper2017/scripts/add.ters.py > add.ters.py

# python scripts do these three things: 1) checks his to give protonation specific names; 2) checks for disulphide bonds; 3) checks for missing residues and adds TER flag
python replace_his_with_hie_hid_hip.py rec.nowat.reduce_clean.pdb rec.nowat.1his.pdb
python replace_cys_to_cyx.py rec.nowat.1his.pdb rec.nowat.2cys.pdb
python add.ters.py rec.nowat.2cys.pdb rec.nowat.3ter.pdb

# add TER to file rec.nowat.H_mod2.pdb before HEM
grep -v "HEM" rec.nowat.3ter.pdb >! rec.nowat.4ter.pdb
echo "TER" >> rec.nowat.4ter.pdb
grep "HEM" rec.nowat.3ter.pdb >> rec.nowat.4ter.pdb

# fix wrong his protonation from automated pipeline 
grep -v "HE2 HIE   172" rec.nowat.4ter.pdb | sed -e "s/HIE   172/HID   172/g" > rec.nowat.final.pdb

# if no histidine fix is required uncomment this
#mv rec.nowat.3ter.pdb rec.nowat.final.pdb
# or rec.nowat.4ter.pdb rec.nowat.final.pdb if TER was added before the heme


### 2nd -- re-run tleap to create amber-ready input files (solvated parameter and coordinate protein and instruction file)

# produces tleap input file in 3 steps, first script based instruction file (loading FF and rec file), then pipes disulphide info into middle and final instructions (write just rec, solvated in TIP3P water box of 10A about rec boundaries, write rec in water-box parameters and coordinates) for complete tleap input file
cat << EOF >! tleap.rec2.in 
set default PBradii mbondi2
# load the protein force field
source leaprc.ff14SB
loadamberparams frcmod_teb_mf_mod.hemall
loadamberprep heme_all_plus1.in
REC = loadpdb rec.nowat.final.pdb
EOF

if (-e rec.nowat.2cys.pdb.for.leap ) then 
  cat rec.nowat.2cys.pdb.for.leap >> tleap.rec2.in
endif

cat << EOF >> tleap.rec2.in
# draw bond between HIS and HEME
# 5 coordinated iron.
bond REC.172.NE2 REC.290.FE
## include the following two lines to include Xtal water in MD run
##  XTALWAT = loadpdb wat.pdb
##  RECWAT  = combine {REC XTALWAT}
## also need to change REC to RECWAT in the next 3 lines
saveamberparm REC rec.leap.prm7 rec.leap.rst7
solvateBox REC TIP3PBOX 10.0
saveamberparm REC rec.watbox.leap.prm7 rec.watbox.leap.rst7
quit
EOF

$AMBERHOME/bin/tleap -s -f tleap.rec2.in > ! tleap.rec2.out

# for ease of visualization in pymol 
$AMBERHOME/bin/ambpdb -p rec.leap.prm7 < rec.leap.rst7 > rec.leap.pdb

echo "Look at rec.leap.pdb in pymol. May have to delete last column in pdb file for pymol."
# may have to remove element column so pymol does not get confused

# inspect tleap.rec.out and the leap.log file
# visually inspect (VMD) rec.10wat.leap.prm7, rec.10wat.leap.rst7 and rec.watbox.leap.prm7, rec.watbox.leap.rst7

Here is a script if you have a protein without any cofactors [Tutorial on running Molecular Dynamics for GIST grid generation with scripts 2].

Run MD

Write the following script to a file (named 004md.1.pmemd_cuda_wrapper.csh).

# TEB/ MF comments Feb2017
#
# This script writes a submission script to run amber MD on a GPU cluster.

set pwd = `pwd`
set workdir = ${pwd}/MDrundir/MDrun
set filedir = ${pwd}/MDrundir/prep/003md_tleap/

if !(-e ${workdir}) then 
  mkdir -p ${workdir}
else
  echo "warning: ${workdir} exists . . ."
  echo "kill with control c."
  pause(10)
endif

cd $workdir

# restrain all protein residues        ### CHANGE THIS according to Amber renumbering
set restraint_mask = ":1-290"
set nameprefix     = "rec.watbox.leap"

# writing submission script
cat << EOF >! qsub.amber.csh
#\$ -S /bin/csh
#\$ -cwd
#\$ -q gpu.q
#\$ -o stdout
#\$ -e stderr

# export CUDA_VISIBLE_DEVICES="0,1,2,3" 
# setenv CUDA_VISIBLE_DEVICES "0,1,2,3"
setenv AMBERHOME /nfs/soft/amber/amber14/ 
set amberexe = "/nfs/ge/bin/on-one-gpu - \$AMBERHOME/bin/pmemd.cuda"

## make a local directory on the server to run calculations
## those will be copied over to /nfs at the end of the script
set SCRATCH_DIR = /scratch
if ! (-d \$SCRATCH_DIR ) then
    SCRATCH_DIR=/tmp
endif
set username = `whoami`

## queue assigns job_id
set TASK_DIR = "\$SCRATCH_DIR/\${username}/\$JOB_ID"
echo \$TASK_DIR

mkdir -p \${TASK_DIR}
cd \${TASK_DIR}
pwd

cp ${filedir}/${nameprefix}.* .

## parameters to add if writing NetCDF -- for production run not visualization (VMD)
# ioutfm=1, # write out NetCDF trajectory
# ntxo = 2, # write out NetCDF restart
# ntrx = 2, # read in NetCDF restart
## we didn't because VMD does not read it in (NetCDF is small in size and has more significant figures).


## equilibration run 01mi-02mi and 01md-08md (see details in comments below)
## production run starts at 09md


## 01mi and 02mi minimize all Hs and waters, restraint in kcal/mol/A^2
cat << EOF1 > ! 01mi.in
01mi.in: equil minimization with very strong Cartesian restraints
&cntrl
   imin=1, maxcyc=3000, ncyc = 1500, 
   ntpr=100, 
   ntr=1, 
   restraint_wt=100.0, 
   restraintmask= '${restraint_mask} & !@H'
/
EOF1

cat << EOF1 > ! 02mi.in
02mi.in: equil minimization with strong Cartesian restraints
&cntrl
   imin=1, maxcyc=3000, ncyc = 1500, 
   ntpr=100, 
   ntr=1, 
   restraint_wt=5.0, 
   restraintmask= '${restraint_mask} & !@H'
/
EOF1

## 01md-06md   -- constant volume simulations, gradually heating up by 50K within each of the 6 steps (6x 20ps)
## 07md                -- const pressure, allow box dimensions to adjust
## 08md        -- const vol, same conditions as production (5ns more equilibration time)

cat << EOF1 > ! 01md.in
01md.in: equilibration, constant volume, temp ramp from 0 to 50K, runs for 20ps (10000 steps); 1step=2femtoseconds.
&cntrl
   imin=0, irest=0, ntx=1, nstlim = 10000,
   ntt=3, tempi=0.0, temp0=50.0, gamma_ln=2.0, ig = 1234,
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002, 
   ntr = 1, iwrap=1, 
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

cat << EOF1 > ! 02md.in
02md.in: equilibration, constant volume,temp ramp from 50 to 100K, runs for 20ps (10000 steps) 
&cntrl
   imin=0, irest=1, ntx=5, nstlim = 10000,
   ntt=3, tempi=50.0, temp0=100.0, gamma_ln=2.0, ig = 2345,
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

cat << EOF1 > ! 03md.in
03md.in: equilibration, constant volume, temp ramp from 100 to 150K, runs for 20ps (10000 steps)
&cntrl
   imin=0, irest=1, ntx=5, nstlim = 10000,
   ntt=3, tempi=100.0, temp0=150.0, gamma_ln=2.0, ig = 3456,
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

cat << EOF1 > ! 04md.in
04md.in: equilibration, constant volume, temp ramp from 150 to 200K, runs for 20ps (10000 steps)
&cntrl
   imin=0, irest=1, ntx=5, nstlim = 10000,
   ntt=3, tempi=150.0, temp0=200.0, gamma_ln=2.0, ig = 4567,
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

cat << EOF1 > ! 05md.in
05md.in: equilibration, constant volume, temp ramp from 200 to 250K, runs for 20ps (10000 steps)
&cntrl
   imin=0, irest=1, ntx=5, nstlim = 10000,
   ntt=3, tempi=200.0, temp0=250.0, gamma_ln=2.0, ig = 5678,
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

cat << EOF1 > ! 06md.in
06md.in: equilibration, constant volume, temp ramp from 250 to RT, runs for 20ps (10000 steps) 
&cntrl
   imin=0, irest=1, ntx=5, nstlim = 10000,
   ntt=3, tempi=250.0, temp0=298.15, gamma_ln=2.0, ig = 6789, 
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

## constant pressure simulation allows box to adjust

#cat << EOF1 > ! 07md.in
#07md.in: equilibration, constant pressure, constant temp at 298.15 run for 5ns (2500000 steps).
#&cntrl
#   ioutfm=1, # write out NetCDF trajectory
#   ntwr=1000000000,
#   imin=0, irest=1, ntx=5, nstlim = 2500000,
#   ntt=3, temp0=298.15, gamma_ln=2.0, ig = 12345,
#   ntp=1, taup=2.0,
#   ntb=2, ntc=2, ntf=2,
#   ntwx=500, ntpr=500, dt = 0.002,
#   ntr = 1, iwrap=1,
#   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
#/
#EOF1

## box dimensions adjust quickly (40ps), crashes if too long (try less then, say 10ps)

cat << EOF1 > ! 07.1md.in
07md.in: equilibration, constant pressure, constant temp at 298.15 run for 40ps (20000 steps).
&cntrl
   ioutfm=1, # write out NetCDF trajectory
   ntwr=1000000000,
   imin=0, irest=1, ntx=5, nstlim = 20000,
   ntt=3, temp0=298.15, gamma_ln=2.0, ig = 12345,
   ntp=1, taup=2.0,
   ntb=2, ntc=2, ntf=2,
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

## same as 07.1md but now starting with adjusted box as new restraint for protein coordinates
## this avoids pulling of protein into old position and hence crashing

cat << EOF1 > ! 07.2md.in
07md.in: equilibration, constant pressure, constant temp at 298.15 run for 5ns (2500000 steps).
&cntrl
   ioutfm=1, # write out NetCDF trajectory
   ntwr=1000000000,
   imin=0, irest=1, ntx=5, nstlim = 2500000,
   ntt=3, temp0=298.15, gamma_ln=2.0, ig = 13245, 
   ntp=1, taup=2.0, 
   ntb=2, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

## last equilibration run, is identical to production run - const volume (is 10% faster than const pressure)
## added ioutfm=1 here to write out NetCDF trajectory
## if ntwr=1000000000 > nstlim = 2500000, means only write one restart file at the end

cat << EOF1 > ! 08md.in
08md.in: equilibration, constant volume, constant temp at 298.15 run for 5ns (2500000 steps).
&cntrl
   ioutfm=1, 
   ntwr=1000000000,
   imin=0, irest=1, ntx=5, nstlim = 2500000,
   ntt=3, temp0=298.15, gamma_ln=2.0, ig = 23456, 
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

## start of PRODUCTION RUN

cat << EOF1 > ! 09md.in
09md.in: production. constant volume, constant temp at 298.15 run for 5ns (2500000 steps).
&cntrl
   ioutfm=1, 
   ntwr=1000000000,
   imin=0, irest=1, ntx=5, nstlim = 2500000,
   ntt=3, temp0=298.15, gamma_ln=2.0, ig = 34567, 
   ntp=0, taup=2.0, 
   ntb=1, ntc=2, ntf=2, 
   ntwx=500, ntpr=500, dt = 0.002,
   ntr = 1, iwrap=1,
   restraintmask = '${restraint_mask} & !@H', restraint_wt = 5.0,
/
EOF1

## generates identical input files for each run, only random seed (ig flag) changes

cat 09md.in | sed 's/09md/10md/g' | sed 's/34567/10101/g'   > 10md.in
cat 09md.in | sed 's/09md/11md/g' | sed 's/34567/111/g'     > 11md.in
cat 09md.in | sed 's/09md/12md/g' | sed 's/34567/2121212/g' > 12md.in
cat 09md.in | sed 's/09md/13md/g' | sed 's/34567/131313/g'  > 13md.in
cat 09md.in | sed 's/09md/14md/g' | sed 's/34567/4443/g'    > 14md.in
cat 09md.in | sed 's/09md/15md/g' | sed 's/34567/151515/g'  > 15md.in #35ns
cat 09md.in | sed 's/09md/16md/g' | sed 's/34567/161616/g'  > 16md.in
cat 09md.in | sed 's/09md/17md/g' | sed 's/34567/171717/g'  > 17md.in
cat 09md.in | sed 's/09md/18md/g' | sed 's/34567/181818/g'  > 18md.in #50ns


## now that everything is set up -- let's run amber.

set pwd = `pwd`

  \$amberexe -O -i 01mi.in -o 01mi.out -p ${nameprefix}.prm7 \
  -c ${nameprefix}.rst7 -ref ${nameprefix}.rst7 \
  -x 01mi.mdcrd -inf 01mi.info -r 01mi.rst7

  \$amberexe -O -i 02mi.in -o 02mi.out -p ${nameprefix}.prm7 \
  -c 01mi.rst7 -ref 01mi.rst7 \
  -x 02mi.mdcrd -inf 02mi.info -r 02mi.rst7

  # CONSIDER running a step of NPT here first to at low temp to ajust the box size. 
  # then swiching to fixed volume to raise the tempature. 
  # or write python script to calc and import box size

  \$amberexe -O -i 01md.in -o 01md.out -p ${nameprefix}.prm7 \
  -c 02mi.rst7 -ref 02mi.rst7 -x 01md.mdcrd -inf 01md.info -r 01md.rst7 

  \$amberexe -O -i 02md.in -o 02md.out -p ${nameprefix}.prm7 \
  -c 01md.rst7 -ref 01md.rst7 -x 02md.mdcrd -inf 02md.info -r 02md.rst7 

  \$amberexe -O -i 03md.in -o 03md.out -p ${nameprefix}.prm7 \
  -c 02md.rst7 -ref 01md.rst7 -x 03md.mdcrd -inf 03md.info -r 03md.rst7 

  \$amberexe -O -i 04md.in -o 04md.out -p ${nameprefix}.prm7 \
  -c 03md.rst7 -ref 01md.rst7 -x 04md.mdcrd -inf 04md.info -r 04md.rst7

  \$amberexe -O -i 05md.in -o 05md.out -p ${nameprefix}.prm7 \
  -c 04md.rst7 -ref 01md.rst7 -x 05md.mdcrd -inf 05md.info -r 05md.rst7

  \$amberexe -O -i 06md.in -o 06md.out -p ${nameprefix}.prm7 \
  -c 05md.rst7 -ref 01md.rst7 -x 06md.mdcrd -inf 06md.info -r 06md.rst7

  \$amberexe -O -i 07.1md.in -o 07.1md.out -p ${nameprefix}.prm7 \
  -c 06md.rst7 -ref 01md.rst7 -x 07.1md.mdcrd -inf 07.1md.info -r 07.1md.rst7

  \$amberexe -O -i 07.2md.in -o 07.2md.out -p ${nameprefix}.prm7 \
  -c 07.1md.rst7 -ref 07.1md.rst7 -x 07.2md.mdcrd -inf 07.2md.info -r 07.2md.rst7

  \$amberexe -O -i 08md.in -o 08md.out -p ${nameprefix}.prm7 \
  -c 07.2md.rst7 -ref 07.2md.rst7 -x 08md.mdcrd -inf 08md.info -r 08md.rst7

## start PRODUCTION (10x 5ns = 50ns); takes about 24hrs on single GPU
  
  \$amberexe -O -i 09md.in -o 09md.out -p ${nameprefix}.prm7 \
  -c 08md.rst7 -ref 07.2md.rst7 -x 09md.mdcrd -inf 09md.info -r 09md.rst7
  
  \$amberexe -O -i 10md.in -o 10md.out -p ${nameprefix}.prm7 \
  -c 09md.rst7 -ref 07.2md.rst7 -x 10md.mdcrd -inf 10md.info -r 10md.rst7
  
  \$amberexe -O -i 11md.in -o 11md.out -p ${nameprefix}.prm7 \
  -c 10md.rst7 -ref 07.2md.rst7 -x 11md.mdcrd -inf 11md.info -r 11md.rst7
  
  \$amberexe -O -i 12md.in -o 12md.out -p ${nameprefix}.prm7 \
  -c 11md.rst7 -ref 07.2md.rst7 -x 12md.mdcrd -inf 12md.info -r 12md.rst7

  \$amberexe -O -i 13md.in -o 13md.out -p ${nameprefix}.prm7 \
  -c 12md.rst7 -ref 07.2md.rst7 -x 13md.mdcrd -inf 13md.info -r 13md.rst7

  \$amberexe -O -i 14md.in -o 14md.out -p ${nameprefix}.prm7 \
  -c 13md.rst7 -ref 07.2md.rst7 -x 14md.mdcrd -inf 14md.info -r 14md.rst7

  \$amberexe -O -i 15md.in -o 15md.out -p ${nameprefix}.prm7 \
  -c 14md.rst7 -ref 07.2md.rst7 -x 15md.mdcrd -inf 15md.info -r 15md.rst7

  \$amberexe -O -i 16md.in -o 16md.out -p ${nameprefix}.prm7 \
  -c 15md.rst7 -ref 07.2md.rst7 -x 16md.mdcrd -inf 16md.info -r 16md.rst7

  \$amberexe -O -i 17md.in -o 17md.out -p ${nameprefix}.prm7 \
  -c 16md.rst7 -ref 07.2md.rst7 -x 17md.mdcrd -inf 17md.info -r 17md.rst7

  \$amberexe -O -i 18md.in -o 18md.out -p ${nameprefix}.prm7 \
  -c 17md.rst7 -ref 07.2md.rst7 -x 18md.mdcrd -inf 18md.info -r 18md.rst7


## move scratch directory from cluster onto nfs
mv \$TASK_DIR ${workdir} 

EOF

qsub qsub.amber.csh 

echo "To view whether job was submitted and running use \n
 qstat \n
Then look at submitted queue e.g. gpu.q@n-1-141.cluster.ucsf.bks and log into node by typing \n
 qlogin -q int.q@n-1-141 \n
 cd /scratch/yourname/jobid \n
 ls -ltr \n
look for latest .info file (e.g. 17md.info) and cat to screen \n
 cat 17md.info \n
Use 005md.checkMDrun.py to see if run went to plan."


Next lets check properties from the MD simulations (005md.checkMDrun.csh).

# This script makes 3 lists: 
#       1) for the initial equilibration run (until RT is reached)
#       2) for the final equilibration run
#       3) for the production run
# It then runs a python script that generates plots for analysis of the quality of the MD run in a separate directory.
#
# by TEB/ MF March 2017


  set jid = $1         #job id as specified by queue e.g. 5609039; this is also the directory name
  set mountdir  = `pwd`
  set workdir   = $mountdir/MDrundir/MDrun/${jid}_plots

  if ($jid == "") then
     echo "Give JobID as argument. \n e.g.    csh 005md.checkMDrun.csh 5609039 \n"
     exit
  endif

  if (-e ${workdir}) then
     echo "$workdir exists"
     exit
  endif

  mkdir -p ${workdir}  
  cd $workdir

  ln -s ../${jid} 
 
  ls ${jid}/01mi.out ${jid}/02mi.out ${jid}/01md.out ${jid}/02md.out ${jid}/03md.out ${jid}/04md.out ${jid}/05md.out ${jid}/06md.out > equil1.txt
  ls ${jid}/07.1md.out ${jid}/07.2md.out ${jid}/08md.out > equil2.txt
  ls ${jid}/09md.out ${jid}/10md.out ${jid}/11md.out ${jid}/12md.out ${jid}/13md.out ${jid}/14md.out ${jid}/15md.out ${jid}/16md.out ${jid}/17md.out ${jid}/18md.out > production.txt

# runs python plotter <in> <out>
  python ${mountdir}/for005md.py equil1.txt equil1.png
  python ${mountdir}/for005md.py equil2.txt equil2.png
  python ${mountdir}/for005md.py production.txt production.png

echo "\n   Now open png's: \n   gthumb MDrundir/MDrun/${jid}_plots/*png"

here is the python script that make the plots. (for005md.py)

import sys
import copy
import math
import matplotlib
import scipy
import numpy
import pylab

def read_MD_outfile(filename,totE, kE, pE, time, temp, pres):

    fileh = open(filename,'r')


    result_flag = False    
    count  = 0
    for line in fileh:
        line = line.strip('\n')
        splitline = line.split()
        if "4.  RESULTS" in line:
            result_flag = True
        elif "A V E R A G E S   O V E R" in line:
            result_flag = False 

        if (result_flag):
           if "NSTEP" in line:
              if (len(splitline)<11):
                  continue
              t_time = float(splitline[5])/1000.0 # convert from ps to ns
              t_temp = float(splitline[8])
              t_pres = float(splitline[11])
              time.append(t_time)
              temp.append(t_temp)
              pres.append(t_pres)
           if "Etot" in line: 
              if (len(splitline)<8):
                  continue
              t_totE = float(splitline[2])
              t_kE   = float(splitline[5])
              t_pE   = float(splitline[8])
              totE.append(t_totE)
              kE.append(t_kE)
              pE.append(t_pE)
    fileh.close()
    return totE, kE, pE, time, temp, pres

def main():

    if len(sys.argv) != 3:
      print "error:  this program takes 2 inputs:"
      print "   (1) filename that contains a list of md output files. If it doesn't exist do sth like this: "
      print "        ls 5609039/*.out > tmpout.txt"
      print "   (2) filename for png plot"
      print "        This should be done automatically as part of 005md.checkMDrun.csh"
      exit()

    filelist        = sys.argv[1]
    filenamepng     = sys.argv[2]

    # read in file with a list of mdout files.
    print "filelist containing MD.out files: " + filelist
    print "Plot will be saved as: " + filenamepng
    filenamelist = []
    fileh = open(filelist,'r')

    for line in fileh:
        tfile = line.strip("\n")
        splitline = tfile.split(".")
        if (splitline[-1] != "out"):
            print "Error. %s is not a .out file" % tfile
            exit()
        filenamelist.append(tfile)
    fileh.close()
    totE = []
    kE   = []
    pE   = []
    time = []
    temp = []
    pres = []

    for filename in filenamelist:
        print "reading info from file: " + filename
        totE, kE, pE, time, temp, pres = read_MD_outfile(filename,totE, kE, pE, time, temp, pres)

    # Plot with 5 panels; tabs [x_left,y_left,x_up,y_up].
    subpanel = [ [0.2,0.1,0.3,0.2], [0.6,0.1,0.3,0.2], [0.2,0.4,0.3,0.2], [0.6,0.4,0.3,0.2], [0.2,0.7,0.3,0.2], [0.6,0.7,0.3,0.2] ]
    descname = ["totE", "kE", "pE", "temp", "pres"]
    fig = pylab.figure(figsize=(8,8))
    for i,desc in enumerate([totE, kE, pE, temp, pres]):
       #print len(desc), len(totE), len(time)

       axis = fig.add_axes(subpanel[i])
       #lim_min = min(math.floor(Ymin),math.floor(Xmin))
       # lim_max = max(math.ceil(Ymax), math.ceil(Xmax))
       im = axis.plot(time,desc,'k-') #,[0,100],[0,100],'--')
       axis.set_xlabel("time (ns)")
       axis.set_ylabel(descname[i])
       #axis.set_title('file='+xyfilename)
       #axis.set_ylim(lim_min, lim_max)
       #axis.set_xlim(lim_min, lim_max)
    #fig.savefig('md_analysis_fig.png',dpi=600) 
    fig.savefig(filenamepng,dpi=600) 

main()


Run GIST post processing

Write a script called 006gist.cpptraj_mk_ref.csh so that we have a reference frame to align to.

## This script writes out the last frame of the 10md trajectory as a reference pdb to which we can align.
## Ref is also a useful visual reference when visualizing gist.

## TEB / MF comments -- March 2017


set mountdir = `pwd`
set workdir  = $mountdir/gist/006ref
rm -rf $workdir
mkdir -p $workdir
cd $workdir

set jobId = "5609039"

ln -s ${mountdir}/MDrundir/MDrun/${jobId} .

#parm rec.wat.leap.prm7 
#rec_w_h means with hydrogens added with reduces.
cat << EOF >! makeref.in 
parm $jobId/rec.watbox.leap.prm7 
trajin $jobId/10md.rst7 1 1 
strip :WAT
trajout ref.pdb pdb
go
EOF

/nfs/soft/amber/amber14/bin/cpptraj -i makeref.in > ! makeref.log &


Next we write a script called 007gist.alignwithchimera_gist_conf.csh to align stuff to the simulation frame.

#!/bin/csh 
## this script was written by trent balius in the Rizzo Group, 2011
## modified in the Shoichet Group, 2013-2015

# TEB, MF comments -- March 2017
#
# This shell script will do the following:
# (1) aligns the ligand file and nearby waters onto MD frame of reference created in previous script [006gist.cpptraj_mk_ref.csh]
# (2) then writes out the aligned ligand and waters which will be used to calculate center of mass which centers the GIST box.
#  Aligned structures are then also useful for visualizing gist.

set mountdir = `pwd`
set workdir  = $mountdir/gist/007align_to_md
rm -rf $workdir
mkdir -p $workdir
cd $workdir

ln -s ${mountdir}/MDrundir/prep/002md_align . 


set ref = "../006ref/ref.pdb"                     # snapshot from simulation
set rec = "002md_align/apo_ref.pdb"               # the receptor given to tleap
set lig = "002md_align/lig_aligned.pdb"           # ligand in the same fram as rec 
set wat = "002md_align/nearby_waters_aligned.pdb" # waters aligned 

set chimerapath = "/nfs/soft/chimera/current/bin/chimera"

#write instruction file for chimera based alignment
cat << EOF > chimera.com
# template #0
open $ref 
# rec #1
open $rec
# xtal-lig
open $lig
# waters
open $wat

# move original to gist. it is harder to move the gist grids. 
mmaker #0 #1 
matrixcopy #1 #2
matrixcopy #1 #3
write format pdb  0 ref.pdb
write format pdb  1 rec_aligned.pdb
write format pdb  2 lig_aligned.pdb
write format mol2 2 lig_aligned.mol2
write format pdb  3 waters_aligned.pdb
EOF
 
${chimerapath} --nogui chimera.com > & chimera.com.out

Write 008a.gist.cpptraj.csh as follows:

# TEB/ MF comments -- March2017

## This script runs GIST. It
# 1) calculates the c.o.m. of the aligned ligand to use those coords as gist box center
# 2) reads in all frames (1-5000) of each trajectory
# 3) makes a GIST box of 40x40x40 voxels with a gridspacing of 0.50A aka a box with the dimensions of 20A in xyz directions.
# 4) submits 1 job to queue and runs ccptraj (with input script we created)

set mountdir = `pwd`
set workdir  = $mountdir/gist/008a.full_gist
rm -rf $workdir
mkdir -p $workdir
cd $workdir

set jobId = "5609039"

ln -s ${mountdir}/MDrundir/MDrun/${jobId} .
ln -s ${mountdir}/gist/007align_to_md/lig_aligned.mol2 .

#copy scrips from web
curl http://docking.org/~tbalius/code/for_dock_3.7/mol2.py > mol2.py
curl http://docking.org/~tbalius/code/for_dock_3.7/mol2_center_of_mass.py > mol2_center_of_mass.py

python ./mol2_center_of_mass.py lig_aligned.mol2 centermol.txt

set center = `cat centermol.txt`


#parm rec.wat.leap.prm7 
#rec_w_h means with hydrogens added with reduces.
## reads in trajectories from 1 to 5000 (10k is picked since > 5000).
cat << EOF >! gist.in
#parm rec.watbox.leap.prm7 
parm ${jobId}/rec.watbox.leap.prm7
trajin ${jobId}/09md.mdcrd 1 10000
trajin ${jobId}/10md.mdcrd 1 10000
trajin ${jobId}/11md.mdcrd 1 10000  
trajin ${jobId}/12md.mdcrd 1 10000 
trajin ${jobId}/13md.mdcrd 1 10000 
trajin ${jobId}/14md.mdcrd 1 10000 
trajin ${jobId}/15md.mdcrd 1 10000 
trajin ${jobId}/16md.mdcrd 1 10000 
trajin ${jobId}/17md.mdcrd 1 10000
trajin ${jobId}/18md.mdcrd 1 10000
gist doorder gridcntr ${center} griddim 40 40 40 gridspacn 0.50 
go
EOF
#gist doorder doeij gridcntr 35.759163 33.268703 31.520596 griddim 40 40 40 gridspacn 0.50 out gist.out


#ln -s ../*.mdcrd .
#ln -s ../rec_h.wat.leap.prm7 .

cat << EOF > qsub_fullgist.csh
#\$ -S /bin/csh
#\$ -cwd
#\$ -q all.q
#\$ -o stdout
#\$ -e stderr

cd $workdir

/nfs/soft/amber/amber14/bin/cpptraj -i gist.in > ! gist.log 

EOF

qsub qsub_fullgist.csh

#/nfs/soft/amber/amber14/bin/cpptraj -i gist.in > ! gist.log &

write 008b.gist.cpptraj_subtraj.csh as follows:

# TEB/ MF comments -- March2017

## This script does the same as 008a but for each individual subtrajectory.
## Run this after you started the 008a script since that takes much longer. Then submit 008b.
## When these runs finish - look at the results and if sth is wrong qdel the full gist run if necessary.

# 1) calculates the c.o.m. of the aligned ligand to use those coords as gist box center
# 2) for each individual subtrajectory -- reads in all frames and saves output to separate subdirectory within "008b.subtraj_gist/$subtraj"
# 3) makes a GIST box of 40x40x40 with a gridspacing of 0.50A aka 20A in xyz
# 4) submits all individual subtraj jobs to the queue and runs ccptraj (with input script we created)

set mountdir = `pwd`

foreach subtraj ( \
  09md \
  10md \
  11md \
  12md \
  13md \
  14md \
  15md \
  16md \
  17md \
  18md \
)

set workdir  = $mountdir/gist/008b.subtraj_gist/$subtraj

rm -rf $workdir
mkdir -p $workdir
cd $workdir

set jobId = "5609039"

ln -s ${mountdir}/MDrundir/MDrun/${jobId} . # for subtraj we could make links to two files insted of the directory
ln -s ${mountdir}/gist/007align_to_md/lig_aligned.mol2 .

#copy scrips from web
curl http://docking.org/~tbalius/code/for_dock_3.7/mol2.py > mol2.py
curl http://docking.org/~tbalius/code/for_dock_3.7/mol2_center_of_mass.py > mol2_center_of_mass.py

python ./mol2_center_of_mass.py lig_aligned.mol2 centermol.txt

set center = `cat centermol.txt`


#parm rec.wat.leap.prm7 
#rec_w_h means with hydrogens added with reduces.
cat << EOF >! gist.in
#parm rec.watbox.leap.prm7 
parm ${jobId}/rec.watbox.leap.prm7
trajin ${jobId}/$subtraj.mdcrd 1 10000
gist doorder gridcntr ${center} griddim 40 40 40 gridspacn 0.50 
go
EOF
#gist doorder doeij gridcntr 35.759163 33.268703 31.520596 griddim 40 40 40 gridspacn 0.50 out gist.out


#ln -s ../*.mdcrd .
#ln -s ../rec_h.wat.leap.prm7 .

cat << EOF > qsub_${subtraj}_gist.csh
#\$ -S /bin/csh
#\$ -cwd
#\$ -q all.q
#\$ -o stdout
#\$ -e stderr

cd $workdir

/nfs/soft/amber/amber14/bin/cpptraj -i gist.in > ! gist.log 

EOF

qsub qsub_${subtraj}_gist.csh

end
  • Steps to analyse GIST maps using Chimera (can also alternatively do this in PyMOL by opening corresponding .dx file and selecting mesh at 5sigma level; this also shows the box; then open ligand, protein and water)
  • chimera
File > Open > /008b.subtraj_gist/09md/gist-g0.dx
Tools > VolumeData > VolumeViewer
mesh
Level: e.g. 5 (means 5x bulk solvent)
File > Open > /007align_to_md/lig_aligned.mol2 
File > Open > /007align_to_md/rec_aligned.pdb
File > Open > /007align_to_md/ref.pdb
File > Open > /007align_to_md/waters_aligned.pdb
Actions > Ribbon > hide
Tools > command line 
display #2:233
display #3:230

You can download the latest version of the GIST tools:

wget https://github.com/tbalius/GIST_DX_tools/archive/master.zip
unzip master.zip
rm master.zip

Write 010a.gist.copy_and_combine.csh as follows. This script combines the raw GIST grids.

#!/bin/csh 

## TEB/ MF comments -- March 2017

#set mountdir = "/mnt/nfs/work/users/tbalius/Water_Project/run_DOCK3.7"
set mountdir = `pwd`

set workdir  = $mountdir/gist/010a.full_gist_combine
set filedir  = $mountdir/gist/008a.full_gist
set scriptdir = $mountdir/GIST_DX_tools-master/src

  rm -rf  ${workdir}
  mkdir ${workdir}
  cd ${workdir}

cp $filedir/gist-dTSorient-dens.dx .
cp $filedir/gist-dTStrans-dens.dx .
cp $filedir/gist-Esw-dens.dx .
cp $filedir/gist-Eww-dens.dx .

# this is the density of the water.
cp $filedir/gist-gO.dx .


cat <<EOF > qsub_full.csh
#!/bin/csh 
#\$ -cwd
#\$ -j yes
#\$ -o stderr
#\$ -q all.q

  cd ${workdir}
 # make a combination of the grids
 # We think that we should be substracting
 # Remove comment from top of line

 # tip3p and amber 14

  set bulkE = "-9.533" # kcal/mol/water
  set numberdensity = "0.0334" # waters/(angstroms^3)
  set A14const1 = \`echo "scale=4; -1.0 * \${bulkE} * \${numberdensity}" | bc\`
  # -0.3184 should be positive -*- = +

  python ${scriptdir}/dx-combine_grids.py gist-Eww-dens.dx       1.0 gist-gO.dx               \${A14const1} 0.0 gist-dEww-dens_ref
  python ${scriptdir}/dx-combine_grids.py gist-Esw-dens.dx       1.0 gist-dEww-dens_ref.dx    1.0           0.0 gist-EswPlusEww_ref
  python ${scriptdir}/dx-combine_grids.py gist-dTSorient-dens.dx 1.0 gist-dTStrans-dens.dx    1.0           0.0 gist-TSsw
  python ${scriptdir}/dx-combine_grids.py gist-EswPlusEww_ref.dx 1.0 gist-TSsw.dx            -1.0           0.0 gist-Gtot1_ref

  python ${scriptdir}/dx-combine_grids.py gist-Esw-dens.dx        1.0 gist-dEww-dens_ref.dx   2.0           0.0 gist-EswPlus2Eww_ref    #<<THIS GUY
  python ${scriptdir}/dx-combine_grids.py gist-EswPlus2Eww_ref.dx 1.0 gist-TSsw.dx           -1.0           0.0 gist-Gtot2_ref

  # apply density cutoff.
  #python ${scriptdir}/dx-density-threshold.py gist-EswPlusEww_ref2.dx gist-gO.dx 5.0 gist-EswPlusEww_ref2_threshold5.0
  
  # norm grids.
  #python ${scriptdir}/dx-divide_grids.py gist-Eww-dens.dx gist-gO.dx 0.0329 gist-Eww-norm
  #python ${scriptdir}/dx-combine_grids.py gist-Eww-norm.dx 1.0 gist-gO.dx 0.0 "-9.533" gist-Eww-norm-ref

EOF

qsub qsub_full.csh 
#csh qsub_full.csh 

This script (010b.gist.copy_and_combine_subtraj.csh) combines the GIST grids from the sub-trajectories.

#!/bin/csh 

# TEB / MF comments -- March2017

## specific to amber14
## go through multipliers ...
## scripts run within seconds on cluster


#set mountdir = "/mnt/nfs/work/users/tbalius/Water_Project/run_DOCK3.7"
set mountdir = `pwd`

foreach subtraj ( \
 09md \
 10md \
 11md \
 12md \
 13md \
 14md \
 15md \
 16md \
 17md \
 18md \
)

set workdir  = $mountdir/gist/010b.subtraj_gist_combine/$subtraj
set filedir  = $mountdir/gist/008b.subtraj_gist/$subtraj
set scriptdir = $mountdir/GIST_DX_tools-master/src

rm -rf $workdir
mkdir -p $workdir
cd $workdir

ln -s $filedir/gist-dTSorient-dens.dx .
ln -s $filedir/gist-dTStrans-dens.dx .
ln -s $filedir/gist-Esw-dens.dx .
ln -s $filedir/gist-Eww-dens.dx .

# this is the density of the water.
ln -s $filedir/gist-gO.dx .

cat <<EOF >! qsub_${subtraj}.csh
#!/bin/csh 
#\$ -cwd
#\$ -j yes
#\$ -o stderr
#\$ -q all.q

  cd ${workdir}
 # make a combination of the grids
 # We think that we should be substracting
 # Remove comment from top of line

 # tip3p and amber 14

  set bulkE = "-9.533" # kcal/mol/water
  set numberdensity = "0.0334" # waters/(angstroms^3)
  set A14const1 = \`echo "scale=4; -1.0 * \${bulkE} * \${numberdensity}" | bc\`
  # -0.3184 should be positive -*- = +

  python ${scriptdir}/dx-combine_grids.py gist-Eww-dens.dx       1.0 gist-gO.dx               \${A14const1} 0.0 gist-dEww-dens_ref
  python ${scriptdir}/dx-combine_grids.py gist-Esw-dens.dx       1.0 gist-dEww-dens_ref.dx    1.0           0.0 gist-EswPlusEww_ref
  python ${scriptdir}/dx-combine_grids.py gist-dTSorient-dens.dx 1.0 gist-dTStrans-dens.dx    1.0           0.0 gist-TSsw
  python ${scriptdir}/dx-combine_grids.py gist-EswPlusEww_ref.dx 1.0 gist-TSsw.dx            -1.0           0.0 gist-Gtot1_ref

  python ${scriptdir}/dx-combine_grids.py gist-Esw-dens.dx        1.0 gist-dEww-dens_ref.dx   2.0           0.0 gist-EswPlus2Eww_ref   #<<THIS GUY
  python ${scriptdir}/dx-combine_grids.py gist-EswPlus2Eww_ref.dx 1.0 gist-TSsw.dx           -1.0           0.0 gist-Gtot2_ref

  # apply density cutoff.
  #python ${scriptdir}/dx-density-threshold.py gist-EswPlusEww_ref2.dx gist-gO.dx 5.0 gist-EswPlusEww_ref2_threshold5.0

  # norm grids.
  #python ${scriptdir}/dx-divide_grids.py gist-Eww-dens.dx gist-gO.dx 0.0329 gist-Eww-norm
  #python ${scriptdir}/dx-combine_grids.py gist-Eww-norm.dx 1.0 gist-gO.dx 0.0 "-9.533" gist-Eww-norm-ref

EOF

qsub qsub_${subtraj}.csh 
#csh qsub_${subtraj}.csh 



end # subtraj

echo " gist-EswPlus2Eww_ref is the best GIST grid retrospectively for CcP and used for prospective docking in (Balius et al.). "
echo " gist-Gtot1_ref is the second best on CcP. "

The following script (010ii.gist_subsite_energetics.csh) calculates, the most occupied waters and energetics of those sites.

#!/bin/csh 

# This script does the following:
# 1) From the MD water freq densities (gist-gO.dx) it generates pdb files containing cluster centers at varying levels of bulk density, here 5, 10, 20, 50 as thresholds.
# 2) Using gist-EswPlus2Eww_ref.dx we then calculate the energies associated with each cluster center (aka MD water), where energy assigned is normalized by the voxel.
# 3) All pdbs can be visualized within PyMOL -- look at the _mdl.pdb files frames using the play button 


## TEB/ MF -- March 2017


setenv DOCKBASE /nfs/home/tbalius/zzz.github/DOCK
source /nfs/soft/python/envs/complete/latest/env.csh


#set mountdir = "/mnt/nfs/work/users/tbalius/Water_Project/run_DOCK3.7"
set mountdir = `pwd`

set workdir  = $mountdir/gist/010ii.gist_subsite_energetics
set filedir  = $mountdir/gist/010a.full_gist_combine/
set scriptdir = $mountdir/GIST_DX_tools-master/src

  rm -rf  ${workdir}
  mkdir ${workdir}
  cd ${workdir}

  cp $filedir/gist-EswPlus2Eww_ref.dx .
  cp $filedir/gist-gO.dx .

# input for py script is e.g.  'gist-gO.dx 1 5 2 05xbulkdens'
# gist-gO.dx   contains the MD densities for waters
# 1            (or -1) is multiplier for water vs anti-water
# 5            waters above 5x bulk density
# 2            times 0.5A grid points are considered connected (aka 1A)
# 05xbulkdens  sensible identifier for pdb file that is generated

  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-gO.dx 1 5 2 05xbulkdens > log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph 05xbulkdens_clusters.pdb 05xbulkdens_clusters.sph >> log
  sed -i 's/ 0\.700   / 1.400   /g' 05xbulkdens_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx 05xbulkdens_clusters.sph > ! 05xbulkdens_clusters_energies.log

  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-gO.dx 1 10 2 10xbulkdens >> log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph 10xbulkdens_clusters.pdb 10xbulkdens_clusters.sph >> log
  sed -i 's/ 0\.700   / 1.400   /g' 10xbulkdens_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx 10xbulkdens_clusters.sph > ! 10xbulkdens_clusters_energies.log

  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-gO.dx 1 20 2 20xbulkdens > log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph 20xbulkdens_clusters.pdb 20xbulkdens_clusters.sph >> log
  sed -i 's/ 0\.700   / 1.400   /g' 20xbulkdens_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx 20xbulkdens_clusters.sph > ! 20xbulkdens_clusters_energies.log

  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-gO.dx 1 50 2 50xbulkdens >> log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph 50xbulkdens_clusters.pdb 50xbulkdens_clusters.sph >> log
  sed -i 's/ 0\.700   / 1.400   /g' 50xbulkdens_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx 50xbulkdens_clusters.sph > ! 50xbulkdens_clusters_energies.log


# Waters below -1 kcal/mol/A^3   and    Anti-waters above 1kcal/mol/A^3 -- using gist-EswPlus2Eww_ref.dx
  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-EswPlus2Eww_ref.dx  1 1.0 2 energy_1p0antiwaters >> log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph energy_1p0antiwaters_clusters.pdb energy_1p0antiwaters_clusters.sph
  sed -i 's/ 0\.700   / 1.400   /g' energy_1p0antiwaters_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx energy_1p0antiwaters_clusters.sph > ! energy_1p0antiwaters.log

  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-EswPlus2Eww_ref.dx -1 1.0 2 energy_-1p0waters >> log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph energy_-1p0waters_clusters.pdb energy_-1p0waters_clusters.sph >> log
  sed -i 's/ 0\.700   / 1.400   /g' energy_-1p0waters_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx energy_-1p0waters_clusters.sph > ! energy_-1p0waters.log
  #cp out0new_gist.dx out0new_gist-EswPlus2Eww_ref_energy_waters.dx

# Waters below -0.5 kcal/mol/A^3   and    Anti-waters above 0.5 kcal/mol/A^3 -- using gist-EswPlus2Eww_ref.dx
  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-EswPlus2Eww_ref.dx  1 0.5 2 energy_0p5antiwaters >> log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph energy_0p5antiwaters_clusters.pdb energy_0p5antiwaters_clusters.sph
  sed -i 's/ 0\.700   / 1.400   /g' energy_0p5antiwaters_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx energy_0p5antiwaters_clusters.sph > ! energy_0p5antiwaters.log

  python $scriptdir/dx-gist_make_centers_of_intensity.py gist-EswPlus2Eww_ref.dx -1 0.5 2 energy_-0p5waters >> log
  $DOCKBASE/proteins/pdbtosph/bin/pdbtosph energy_-0p5waters_clusters.pdb energy_-0p5waters_clusters.sph >> log
  sed -i 's/ 0\.700   / 1.400   /g' energy_-0p5waters_clusters.sph
  python $scriptdir/clusterEnergies_from_dx-sph.py gist-EswPlus2Eww_ref.dx energy_-0p5waters_clusters.sph > ! energy_-0p5waters.log
  #cp out0new_gist.dx out0new_gist-EswPlus2Eww_ref_energy_waters.dx


# merging individual pdbs into one:
# makes single pdb file with "MODEL" entries for PyMOl to thumb through out of the 4 pdbs that contain cluster centers at varying levels of X-times bulk density.
  echo "MODEL 1" > bulkdens_frames_mdl.pdb
  cat 05xbulkdens_clusters.pdb >> bulkdens_frames_mdl.pdb
  echo "ENDMDL \nMODEL 2" >> bulkdens_frames_mdl.pdb
  cat 10xbulkdens_clusters.pdb >> bulkdens_frames_mdl.pdb
  echo "ENDMDL \nMODEL 3" >> bulkdens_frames_mdl.pdb
  cat 20xbulkdens_clusters.pdb >> bulkdens_frames_mdl.pdb
  echo "ENDMDL \nMODEL 4" >> bulkdens_frames_mdl.pdb
  cat 50xbulkdens_clusters.pdb >> bulkdens_frames_mdl.pdb

# makes single pdb file with "MODEL" entries for PyMOl to thumb through out of the 4 pdbs that contain all points composing the clusters at varying levels of X-times bulk density.
  echo "MODEL 1" > bulkdens_points_mdl.pdb
  cat 05xbulkdens_points.pdb >> bulkdens_points_mdl.pdb
  echo "ENDMDL \nMODEL 2" >> bulkdens_points_mdl.pdb
  cat 10xbulkdens_points.pdb >> bulkdens_points_mdl.pdb
  echo "ENDMDL \nMODEL 3" >> bulkdens_points_mdl.pdb
  cat 20xbulkdens_points.pdb >> bulkdens_points_mdl.pdb
  echo "ENDMDL \nMODEL 4" >> bulkdens_points_mdl.pdb
  cat 50xbulkdens_points.pdb >> bulkdens_points_mdl.pdb

# makes single pdb file with "MODEL" entries for PyMOl to thumb through out of the 4 pdbs that contain Energies for each cluster center at varying levels of X-times bulk density.  
echo "MODEL 1" > clustEnergy_frames_mdl.pdb
  cat clustEnergy05xbulkdens_clusters.pdb >> clustEnergy_frames_mdl.pdb
  echo "ENDMDL \nMODEL 2" >> clustEnergy_frames_mdl.pdb
  cat clustEnergy10xbulkdens_clusters.pdb >> clustEnergy_frames_mdl.pdb
  echo "ENDMDL \nMODEL 3" >> clustEnergy_frames_mdl.pdb
  cat clustEnergy20xbulkdens_clusters.pdb >> clustEnergy_frames_mdl.pdb
  echo "ENDMDL \nMODEL 4" >> clustEnergy_frames_mdl.pdb
  cat clustEnergy50xbulkdens_clusters.pdb >> clustEnergy_frames_mdl.pdb

# consider making energy pdbs with MODEL entries for PyMOL. 

echo "Open pdbs in PyMOL \n pymol gist/010ii.gist_subsite_energetics/*pdb gist/007align_to_md/*pdb"

#   $DOCKBASE/proteins/pdbtosph/bin/pdbtosph one_center_of_energies.pdb one_center_of_energies.sph
#  sed -i 's/0.700    7/1.400    7/g' one_center_of_energies.sph
#  python ~/zzz.scripts/dx-gist_score_gist_sph.py loopC.EswPusEww.dx  one_center_of_energies.sph > ! one_center_of_energies.log
#  #cp out0new_gist.dx out0new_gist_loopC_one_center_of_energies.dx