Mol2db2 format

From DISI
Jump to navigation Jump to search

mol2db format describes files created by the mol2db2 program for input into the DOCK 3.7 molecular docking program.

mol2db2 format was designed by Ryan Coleman as part of his postdoctoral research in the Shoichet Lab.

It was first introduced with DOCK 3.7 and is not compatible with previous versions of the DOCK series such as DOCK 3.6


File format description

Mol2db2 Format 2
File Format

T type information (implicitly assumed)
M molecule (4 lines req'd, after that they are optional, 24 lines max)
A atoms
B bond
X xyz
R rigid xyz for matching (can actually be any xyzs)
C conformation
S sets
D clusters
E end of molecule
T ## namexxxx (implicitly assumed to be the standard 7)
M zincname protname #atoms #bonds #xyz #confs #sets #rigid #Mlines #clusters
M charge polar_solv apolar_solv total_solv surface_area
M smiles
M longname
[M arbitrary information preserved for writing out]
A stuff about each atom, 1 per line 
B stuff about each bond, 1 per line
X coordnum atomnum confnum x y z 
R rigidnum color x y z
C confnum coordstart coordend
S setnum #lines #confs_total broken hydrogens omega_energy
S setnum linenum #confs confs [until full column]
D clusternum setstart setend matchstart matchend #additionalmatching
D matchnum color x y z
E 
With the above descriptions, here is a description of the columns that are used. Format statements for python/fortran will also appear at some point. If speed/size becomes an issue this might get replaced with a binary file format.
notes: 17 children groups/group per line in current scheme. 9 children confs/group per line. 9 children confs/conf per line. 8 confs/set per line. groups/confs with no children are written out.
on the atom line, dt is dock type and co is color.
          1         2         3         4         5         6         7
01234567890123456789012345678901234567890123456789012345678901234567890123456789
T ## typename
M ZINCCODEXXXXXXXX PROTCODEX ATO BON XYZXXX CONFSX SETSXX RIGIDX MLINES NUMCLU
M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
M LONGNAMEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[M ARBITRARY_INFORMATION_PRESERVEDXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]
A NUM NAME TYPEX DT CO +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
B NUM ATO ATO TY
X COORDNUMX ATO CONFNU +XCO.ORDX +YCO.ORDX +ZCO.ORDX
R NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
C CONFNO COORDSTAR COORDENDX
S SETIDX #LINES #CO C H +ENERGY.XXX
S SETIDX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS
D CLUSID STASET ENDSET MST MEN ADD
D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
E
the type lines following are assumed by dock unless overriden:
T  1 positive
T  2 negative
T  3 acceptor
T  4 donor
T  5 ester_o
T  6 amide_o
T  7 neutral
the following are the format statements for python for each line
T %2d %8s\n
M %16s %9s %3d %3d %6d %6d %6d %6d &6d %6d\n
M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
M %77s\n
M %77s\n
M %77s\n
A %3d %-4s %-5s %2d %2d %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
B %3d %3d %3d %-2s\n
X %9d %3d %6d %+9.4f %+9.4f %+9.4f\n
R %3d %2d %+9.4f %+9.4f %+9.4f\n
C %6d %9d %9d\n
S %6d %6d %3d %1d %1d %+11.3f\n
S %6d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n 
D %6d %6d %6d %3d %3d %3d\n
D %3d %2d %+9.4f %+9.4f %+9.4f\n
E\n
The following are the fortran77 format statements
!T ## namexxxx (implicitly assumed to be the standard 7)
1000 format(2x,i2,1x,a8)
!M zincname protname #atoms #bonds #xyz #groups #confs #sets #rigid #mlines #clusters
2000 format(2x,a16,1x,a9,1x,i3,1x,i3,1x,i6,1x,i6,1x,i6,x,i6,x,i6,x,i6,x,i6)
!M charge polar_solv apolar_solv total_solv surface_area
2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3)
!M smiles or longname
2200 format(2x,a77)
!A stuff about each atom, 1 per line
3000 format(2x,i3,1x,a4,1x,a5,1x,i2,1x,i2,1x,f9.4,1x,f10.3,1x,
    &       f10.3,1x,f10.3,1x,f9.3)
!B stuff about each bond, 1 per line
4000 format(2x,i3,1x,i3,1x,i3,1x,a2)
!X atomnum confnum x y z
5000 format(2x,i9,1x,i3,1x,i6,x,f9.4,1x,f9.4,1x,f9.4)
!R rigidnum color x y z
6000 format(2x,i3,x,i2,x,f9.4,1x,f9.4,1x,f9.4)
!C confnum #startcoord #endcoord
7000 format(2x,i6,1x,i9,1x,i9)
!S setnum #lines #confs_total broken hydrogens omega_energy
8000 format(2x,i6,1x,i6,1x,i3,1x,i1,1x,i1,1x,f11.3)
!S setnum linenum #confs confs [until full column]
8100 format(2x,i6,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6,
    &       1x,i6,1x,i6,1x,i6,1x,i6)
!D CLUSID STARTSETX ENDSETXXX ADD MST MEN
9000 format(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3)
!D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
!re-use 6000
!E
!E does not get a format line
The following are Fortran95 format statements:
!T ## namexxxx (implicitly assumed to be the standard 7)
      character (len=*), parameter :: DB2NAME = '(2x,i2,x,a8)' !1000
!M zincname protname #atoms #bonds #xyz #confs #sets #rigid #maxmlines #clusters
      character (len=*), parameter :: DB2M1 =
     &    '(2x,a16,x,a9,x,i3,x,i3,x,i6,x,i6,x,i6,x,i6,x,i6,x,i6)' !2000
!M charge polar_solv apolar_solv total_solv surface_area
      character (len=*), parameter :: DB2M2 =
     &    '(2x,f9.4,x,f10.3,x,f10.3,x,f10.3,x,f9.3)' !2100
!M smiles/longname/arbitrary
      character (len=*), parameter :: DB2M3 = '(2x,a78)' !2200
!A stuff about each atom, 1 per line
      character (len=*), parameter :: DB2ATOM =
     &    '(2x,i3,x,a4,x,a5,x,i2,x,i2,x,f9.4,x,f10.3,x,
     &    f10.3,x,f10.3,x,f9.3)' !3000
!B stuff about each bond, 1 per line
     character (len=*), parameter :: DB2BOND =
     &    '(2x,i3,x,i3,x,i3,x,a2)' !4000
!X coordnumx atomnum confnum x y z
      character (len=*), parameter :: DB2COORD =
     &    '(2x,i9,x,i3,x,i6,x,f9.4,x,f9.4,x,f9.4)' !5000
!R rigidnum color x y z
      character (len=*), parameter :: DB2RIGID =
     &    '(2x,i6,x,i2,x,f9.4,x,f9.4,x,f9.4)' !6000
!C confnum coordstart coordend
      character (len=*), parameter :: DB2CONF = '(2x,i6,x,i9,x,i9)' !7000
!S setnum #lines #confs_total broken hydrogens omega_energy 
      character (len=*), parameter :: DB2SET1 =
     &    '(2x,i6,x,i6,x,i3,x,i1,x,i1,x,f11.3)' !8000
!S setnum linenum #confs confs [until full column]
      character (len=*), parameter :: DB2SET2 =
     &    '(2x,i6,x,i6,x,i1,x,i6,x,i6,x,i6,x,i6,
     &    1x,i6,x,i6,x,i6,x,i6)' !8100
!D CLUSID STASET ENDSET ADD(ittional matching spheres count) MST(art) MEN(d)
      character (len=*), parameter :: DB2CLUSTER =
     &    '(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3)' !9000
!D NUM CO x y z
!reuse DB2RIGID
!E
!E does not get a format line


http://i.creativecommons.org/l/by-sa/3.0/88x31.png

This work is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/ This page is adapted from "DOCK3.7 Documentation" by Ryan G. Coleman. Based on a work at https://sites.google.com/site/dock37wiki/.