Rescoring with DOCK 3.7: Difference between revisions

From DISI
Jump to navigation Jump to search
(Created page with " We often want to get the score for a molecule without doing any docking. DOCK3.7 now can do this internally. In DOCK 3.6 this was done in an exteranl program scoreopt. =...")
 
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 2: Line 2:
We often want to get the score for a molecule without doing any docking.
We often want to get the score for a molecule without doing any docking.


DOCK3.7 now can do this internally.  In DOCK 3.6 this was done in an exteranl program scoreopt.   
''DOCK 3.7'' now can do this internally.  In ''DOCK 3.6'' this was done in an exteranl program ''scoreopt'', which is no longer used.   


== need files ==
== needed files ==
To rescore you need 3 files:
To rescore you need 3 files:
     poses.mol2.gz
     poses.mol2.gz
Line 12: Line 12:
== how to generate need files ==
== how to generate need files ==


   run.rescore_prep.csh
Currently, the format of the mol2 file is very rigid.  It must be in the same format as mol2s produced by ''DOCK 3.7''.  The script ''convert_anyMol2_to_dockMol2.py'' should convert mol2 files into the right format.     
 
Here is a tarball with all the scripts you will need for rescoring (this will likely be provided in a future release of the code):
 
[http://docking.org/~tbalius/code/for_dock_3.7/rescoring/rescoring.tar.gz rescoring.tar.gz]
 
Here is what is in the tarball:
 
  drwxr-xr-x tbalius/bks      0 2018-09-26 08:28 rescoring/
  -rw-r--r-- tbalius/bks    429 2018-09-26 08:28 rescoring/1.run.rescore_prep.csh
  -rw-r--r-- tbalius/bks    575 2018-09-26 08:28 rescoring/mol2toDOCK37type.py
  -rw-r--r-- tbalius/bks    1757 2018-09-26 08:28 rescoring/2.rescore_get_parms_rerun_mod.csh
  -rw-r--r-- tbalius/bks    1725 2018-09-26 08:24 rescoring/convert_anyMol2_to_dockMol2.py
  -rw-r--r-- tbalius/bks  32030 2018-09-26 08:21 rescoring/mol2.py
  -rw-r--r-- tbalius/bks    3074 2018-09-26 08:19 rescoring/separate_mol2_more10000.py
 
 
The following script will process a mol2 file produced by dock for rescoring.
   1.run.rescore_prep.csh
 
Here is the script:
 
#rm poses.mol2.gz vdw.txt.gz amsol.txt.gz
#
#zcat test.mol2.gz >! poses.mol2
set ligs_mol2 = $1
#if $ligs_mol2:e == 'gz' then
#  echo $ligs_mol2 $ligs_mol2:r $ligs_mol2:e
cp $ligs_mol2 poses.mol2
#csh 2.rescore_get_parms_rerun_mod.csh poses.mol2 noamsol
csh 2.rescore_get_parms_rerun_mod.csh poses.mol2 amsol
gzip -f poses.mol2
gzip -f vdw.txt
gzip -f amsol.txt


Here is a script that will generate the amsol and vdw files from a mol2 file:  
Here is a script that will generate the amsol and vdw files from a mol2 file:  


   0008.rescore_get_parms_rerun_mod.csh
   2.rescore_get_parms_rerun_mod.csh
 
Here is the script:
 
set mol2file = $1
set ifamsol  = $2
set list = `awk '/  Name:/{print $3}' $mol2file`
rm vdw.txt amsol.txt
touch vdw.txt amsol.txt
# (1) braekup mol2 file. 
#
  python /nfs/home/tbalius/zzz.scripts/separate_mol2_more10000.py $mol2file mol
# foreach molecule
  foreach mol2 (`ls mol*.mol2`)
    set name = $mol2:r
    echo $mol2
    rm -r $name
    mkdir $name
    cd $name
    cp ../$mol2 .
# (2) mape vdw parms on to the atomtypes
    python /nfs/home/tbalius/zzz.scripts/mol2toDOCK37type.py $mol2 vdw.txt
    #ls -lt | head
# (3) run amsol
    if ($ifamsol == 'amsol') then
        csh /nfs/home/tbalius/zzz.github/DOCK/ligand/amsol/calc_solvation.csh $mol2
        awk 'BEGIN{count=0}{if(count>0){printf"%s %s %s %s\n", $2, $4, $5, $3}; count=count+1}' output.solv >! output.solv2
    else if ($ifamsol == 'noamsol') then
        echo "amsol is not calculated."
    else
        echo "ERROR. . . "
        exit
    endif 
    cd ../
    echo "########$name########" >> vdw.txt
    cat $name/vdw.txt >> vdw.txt
    #paste $name/vdw.txt $name/output.solv2 | awk '{printf"%2s %3s %-6s %5s %5s %5s %5s\n", $1, $2, $3, $5, $6, $7, $8}' >> amsol.txt
    if ($ifamsol == 'amsol') then
        echo "########$name########" >> amsol.txt
        paste $name/vdw.txt $name/output.solv2 | awk '{printf"%2s %3s %5s %5s %5s %5s\n", $1, $2, $5, $6, $7, $8}' >> amsol.txt
    else
        cat vdw.txt | awk '{if(NF==1){print $0} else if(NF==4){printf ("%2d %3s %5.2f %5.2f %5.2f %5.2f\n", $1, $2, 0.0,0.0,0.0,0.0)}}' >! amsol.txt
    endif
#
  end


It will generate the amsol file by reruning amsol using the docked poses.  
It will generate the amsol file by reruning amsol using the docked poses.  
You could download the amsol file for the promoter of interest from zinc15.
for example:
  curl http://files.docking.org/protomers/08/06/14/455080614.solv > output.solv2
process it for dock:
  echo "########$name########" >> amsol.txt
  paste vdw.txt output.solv2 | awk '{printf"%2s %3s %5s %5s %5s %5s\n", $1, $2, $5, $6, $7, $8}' >> amsol.txt 


It is also possible to get the amsol parameters from the db2 files:
It is also possible to get the amsol parameters from the db2 files:
   /mnt/nfs/work/tbalius/Water_Project_newgrid_mod_heme_charge/0008.rescore_get_parms_from_db_mod.csh
   /mnt/nfs/work/tbalius/Water_Project_newgrid_mod_heme_charge/0008.rescore_get_parms_from_db_mod.csh
This is a bit messy and slow. 
Here is the script:
set mol2file = $1 ## dock3.7 output file
#set ZINCID = $1
#set db2file = $2
set dbpath = $2
#echo $ZINCID
#echo $db2file
set list = `awk '/  Name:/{print $3}' $mol2file`
rm vdw.txt amsol.txt
touch vdw.txt amsol.txt
foreach ZINCID ($list)
  echo $ZINCID
  # get the number of atoms
  awk 'BEGIN{flag=0}{if (flag == 1){print "atomnum="$1;flag=0} if ($1 == "'$ZINCID'"){flag = 1}}'  $mol2file # print the number of atoms # line after zinc id
  set atomnum = `awk 'BEGIN{flag=0}{if (flag == 1){print $1;flag=0} if ($1 == "'$ZINCID'"){flag = 1}}'  $mol2file` # print the number of atoms # line after zinc id
  set db2file = `grep -a20 $ZINCID  $mol2file | grep "Ligand Source File:" | awk '{print $5}' | sort | uniq `
  echo $db2file
  echo $dbpath/$db2file
  #zcat $db2file | awk 'BEGIN{count=0} /M    /{flag="False"};{if($2 =="'$ZINCID'" && $4 == "'$atomnum'" && flag=="False"){flag="True"; print "atomnum="$4 "::" $0; count=count+1};if (($1 == "A") && flag=="True"){print count":"$0}}'
  #exit
  zcat $dbpath/$db2file | awk 'BEGIN{count=0} /M    /{flag="False"};{if($2 =="'$ZINCID'" && $4 == "'$atomnum'" && flag=="False"){flag="True"; count=count+1};if (($1 == "A") && flag=="True"){print count":"$0}}' > ! $ZINCID.parms.txt
  #zcat $db2file | awk 'BEGIN{count=0} /M    /{flag="False"};{if($2 =="'$ZINCID'" && flag=="False"){flag="True"; count=count+1; print "found '$ZINCID'"};if(($1 == "A") && (flag=="True") ){print count":"$0}}'
    # this will only return the first ZINC ID incountered.
  echo "## $ZINCID parms" >> vdw.txt
  echo "## $ZINCID parms" >> amsol.txt
  # make vdw file
  grep "^1:" $ZINCID.parms.txt | sed 's/1://g' | awk '{printf "%2d %3s %-5s %2d\n", $2, $3, $4, $5}' >> vdw.txt
  #awk '{printf "%2d %3s %-5s %2d\n", $2, $3, $4, $5}' $ZINCID.parms.txt >> vdw.txt
  # amsol file
  grep "^1:" $ZINCID.parms.txt | sed 's/1://g' | awk '{printf "%2d %3s  %6.3f    %6.3f    %6.3f    %6.3f\n", $2, $3, $8, $9, $10, $11}' >> amsol.txt
  #awk '{printf "%2d %3s  %6.3f    %6.3f    %6.3f    %6.3f\n", $2, $3, $8, $9, $10, $11}' $ZINCID.parms.txt >> amsol.txt
end #ZINCID


== INDOCK Parameters ==
== INDOCK Parameters ==

Latest revision as of 15:45, 26 September 2018

We often want to get the score for a molecule without doing any docking.

DOCK 3.7 now can do this internally. In DOCK 3.6 this was done in an exteranl program scoreopt, which is no longer used.

needed files

To rescore you need 3 files:

   poses.mol2.gz
   amsol.txt.gz
   vdw.txt.gz

how to generate need files

Currently, the format of the mol2 file is very rigid. It must be in the same format as mol2s produced by DOCK 3.7. The script convert_anyMol2_to_dockMol2.py should convert mol2 files into the right format.

Here is a tarball with all the scripts you will need for rescoring (this will likely be provided in a future release of the code):

rescoring.tar.gz

Here is what is in the tarball:

 drwxr-xr-x tbalius/bks       0 2018-09-26 08:28 rescoring/
 -rw-r--r-- tbalius/bks     429 2018-09-26 08:28 rescoring/1.run.rescore_prep.csh
 -rw-r--r-- tbalius/bks     575 2018-09-26 08:28 rescoring/mol2toDOCK37type.py
 -rw-r--r-- tbalius/bks    1757 2018-09-26 08:28 rescoring/2.rescore_get_parms_rerun_mod.csh
 -rw-r--r-- tbalius/bks    1725 2018-09-26 08:24 rescoring/convert_anyMol2_to_dockMol2.py
 -rw-r--r-- tbalius/bks   32030 2018-09-26 08:21 rescoring/mol2.py
 -rw-r--r-- tbalius/bks    3074 2018-09-26 08:19 rescoring/separate_mol2_more10000.py


The following script will process a mol2 file produced by dock for rescoring.

  1.run.rescore_prep.csh

Here is the script:

#rm poses.mol2.gz vdw.txt.gz amsol.txt.gz
#
#zcat test.mol2.gz >! poses.mol2

set ligs_mol2 = $1


#if $ligs_mol2:e == 'gz' then
#   echo $ligs_mol2 $ligs_mol2:r $ligs_mol2:e 

cp $ligs_mol2 poses.mol2

#csh 2.rescore_get_parms_rerun_mod.csh poses.mol2 noamsol
csh 2.rescore_get_parms_rerun_mod.csh poses.mol2 amsol
gzip -f poses.mol2
gzip -f vdw.txt
gzip -f amsol.txt

Here is a script that will generate the amsol and vdw files from a mol2 file:

  2.rescore_get_parms_rerun_mod.csh

Here is the script:


set mol2file = $1 
set ifamsol  = $2

set list = `awk '/  Name:/{print $3}' $mol2file`
rm vdw.txt amsol.txt
touch vdw.txt amsol.txt

# (1) braekup mol2 file.  
# 
  python /nfs/home/tbalius/zzz.scripts/separate_mol2_more10000.py $mol2file mol 
# foreach molecule
  foreach mol2 (`ls mol*.mol2`)
    set name = $mol2:r
    echo $mol2
    rm -r $name 
    mkdir $name
    cd $name
    cp ../$mol2 .

# (2) mape vdw parms on to the atomtypes
    python /nfs/home/tbalius/zzz.scripts/mol2toDOCK37type.py $mol2 vdw.txt
    #ls -lt | head

# (3) run amsol
    if ($ifamsol == 'amsol') then 
       csh /nfs/home/tbalius/zzz.github/DOCK/ligand/amsol/calc_solvation.csh $mol2
       awk 'BEGIN{count=0}{if(count>0){printf"%s %s %s %s\n", $2, $4, $5, $3}; count=count+1}' output.solv >! output.solv2
    else if ($ifamsol == 'noamsol') then
       echo "amsol is not calculated."
    else 
       echo "ERROR. . . "
       exit
    endif  
    cd ../
    echo "########$name########" >> vdw.txt
    cat $name/vdw.txt >> vdw.txt 

    #paste $name/vdw.txt $name/output.solv2 | awk '{printf"%2s %3s %-6s %5s %5s %5s %5s\n", $1, $2, $3, $5, $6, $7, $8}' >> amsol.txt
    if ($ifamsol == 'amsol') then
       echo "########$name########" >> amsol.txt
       paste $name/vdw.txt $name/output.solv2 | awk '{printf"%2s %3s %5s %5s %5s %5s\n", $1, $2, $5, $6, $7, $8}' >> amsol.txt
    else
       cat vdw.txt | awk '{if(NF==1){print $0} else if(NF==4){printf ("%2d %3s %5.2f %5.2f %5.2f %5.2f\n", $1, $2, 0.0,0.0,0.0,0.0)}}' >! amsol.txt 
    endif 
#
  end

It will generate the amsol file by reruning amsol using the docked poses.

You could download the amsol file for the promoter of interest from zinc15. for example:

 curl http://files.docking.org/protomers/08/06/14/455080614.solv > output.solv2

process it for dock:

 echo "########$name########" >> amsol.txt
 paste vdw.txt output.solv2 | awk '{printf"%2s %3s %5s %5s %5s %5s\n", $1, $2, $5, $6, $7, $8}' >> amsol.txt   


It is also possible to get the amsol parameters from the db2 files:

  /mnt/nfs/work/tbalius/Water_Project_newgrid_mod_heme_charge/0008.rescore_get_parms_from_db_mod.csh

This is a bit messy and slow.

Here is the script:

set mol2file = $1 ## dock3.7 output file
#set ZINCID = $1
#set db2file = $2
set dbpath = $2

#echo $ZINCID
#echo $db2file

set list = `awk '/  Name:/{print $3}' $mol2file`
rm vdw.txt amsol.txt
touch vdw.txt amsol.txt

foreach ZINCID ($list)

  echo $ZINCID
  # get the number of atoms 
  awk 'BEGIN{flag=0}{if (flag == 1){print "atomnum="$1;flag=0} if ($1 == "'$ZINCID'"){flag = 1}}'  $mol2file # print the number of atoms # line after zinc id
  set atomnum = `awk 'BEGIN{flag=0}{if (flag == 1){print $1;flag=0} if ($1 == "'$ZINCID'"){flag = 1}}'  $mol2file` # print the number of atoms # line after zinc id

  set db2file = `grep -a20 $ZINCID  $mol2file | grep "Ligand Source File:" | awk '{print $5}' | sort | uniq `
  echo $db2file
  echo $dbpath/$db2file
  #zcat $db2file | awk 'BEGIN{count=0} /M    /{flag="False"};{if($2 =="'$ZINCID'" && $4 == "'$atomnum'" && flag=="False"){flag="True"; print "atomnum="$4 "::" $0; count=count+1};if (($1 == "A") && flag=="True"){print count":"$0}}' 
  #exit
  zcat $dbpath/$db2file | awk 'BEGIN{count=0} /M    /{flag="False"};{if($2 =="'$ZINCID'" && $4 == "'$atomnum'" && flag=="False"){flag="True"; count=count+1};if (($1 == "A") && flag=="True"){print count":"$0}}' > ! $ZINCID.parms.txt
  #zcat $db2file | awk 'BEGIN{count=0} /M    /{flag="False"};{if($2 =="'$ZINCID'" && flag=="False"){flag="True"; count=count+1; print "found '$ZINCID'"};if(($1 == "A") && (flag=="True") ){print count":"$0}}' 
   # this will only return the first ZINC ID incountered.

  echo "## $ZINCID parms" >> vdw.txt
  echo "## $ZINCID parms" >> amsol.txt

  # make vdw file
  grep "^1:" $ZINCID.parms.txt | sed 's/1://g' | awk '{printf "%2d %3s %-5s %2d\n", $2, $3, $4, $5}' >> vdw.txt
  #awk '{printf "%2d %3s %-5s %2d\n", $2, $3, $4, $5}' $ZINCID.parms.txt >> vdw.txt
  # amsol file
  grep "^1:" $ZINCID.parms.txt | sed 's/1://g' | awk '{printf "%2d %3s   %6.3f     %6.3f     %6.3f    %6.3f\n", $2, $3, $8, $9, $10, $11}' >> amsol.txt
  #awk '{printf "%2d %3s   %6.3f     %6.3f     %6.3f    %6.3f\n", $2, $3, $8, $9, $10, $11}' $ZINCID.parms.txt >> amsol.txt
end #ZINCID

INDOCK Parameters

Here is the parameters in the INDOCK file:

DOCK 3.7 parameter
#####################################################
### NOTE: split_database_index is reserved to specify a list of files
search_type                   2
mol2file                      poses.mol2.gz
ligsolfile                    amsol.txt.gz
ligvdwfile                    vdw.txt.gz
#####################################################
# NOTE: split_database_index is reserved to specify a list of files
ligand_atom_file               split_database_index

note that the split_database_index file is not used it is just a place holder.