Mol2db2 Format 2

From DISI
Revision as of 15:44, 23 October 2014 by TBalius (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This page is a wishlist for features that would be nice for a new version of the flexibase file format to support. mol2db2 format features that are actually implemented so far are marked [x]

New Features

implemented

  • Real Atom Types and Bond Information [x]
  • Way to determine which mix-and-match conformations have clashes (and avoid trying them) [x]
  • A place to store an internal energy for each possible conformation [x]
  • Terminal hydrogen rotations?? [x]
  • support for clusters of conformations [x]
  • arbitrary information to be written into output mol2 file (5th and above M lines) [x]

wished

  • Per-conformation per-atom partial charge & solvation information to support internal energies
  • Aliphatic ring movements?
  • group tagging (needed for covalent docking) and basic set of covalent groups
  • specified rigid component override (and better rules for finding non-ring rigid components)
  • per molecule pKa
  • valence for each atom

Nomenclature Definitions

  • Conf - one set of atoms that moves together with a single position per atom.
  • Set - a group of conformations that completely defines one position for each atom in a ligand.
  • Cluster - Not yet implamented in DOCK3.7
  • Cloud - Not yet implamented in DOCK3.7

File Format

current plan for the file format

  • T type information (implicitly assumed)
  • M molecule (4 lines req'd, after that they are optional, 24 lines max)
  • A atoms
  • B bond
  • X xyz
  • R rigid xyz for matching (can actually be any xyzs)
  • C conformation
  • S sets
  • D clusters
  • E end of molecule
T ## namexxxx (implicitly assumed to be the standard 7)
M zincname protname #atoms #bonds #xyz #confs #sets #rigid #Mlines #clusters
M charge polar_solv apolar_solv total_solv surface_area
M smiles
M longname
[M arbitrary information preserved for writing out]
A stuff about each atom, 1 per line 
B stuff about each bond, 1 per line
X coordnum atomnum confnum x y z 
R rigidnum color x y z
C confnum coordstart coordend
S setnum #lines #confs_total broken hydrogens omega_energy
S setnum linenum #confs confs [until full column]
D clusternum setstart setend matchstart matchend #additionalmatching
D matchnum color x y z
E 

With the above descriptions, here is a description of the columns that are used. Format statements for python/fortran will also appear at some point. If speed/size becomes an issue this might get replaced with a binary file format.

notes: 17 children groups/group per line in current scheme. 9 children confs/group per line. 9 children confs/conf per line. 8 confs/set per line. groups/confs with no children are written out.

on the atom line, dt is dock type and co is color.

          1         2         3         4         5         6         7
01234567890123456789012345678901234567890123456789012345678901234567890123456789
T ## typename
M ZINCCODEXXXXXXXX PROTCODEX ATO BON XYZXXX CONFSX SETSXX RIGIDX MLINES NUMCLU
M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
M LONGNAMEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[M ARBITRARY_INFORMATION_PRESERVEDXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]
A NUM NAME TYPEX DT CO +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
B NUM ATO ATO TY
X COORDNUMX ATO CONFNU +XCO.ORDX +YCO.ORDX +ZCO.ORDX
R NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
C CONFNO COORDSTAR COORDENDX
S SETIDX #LINES #CO C H +ENERGY.XXX
S SETIDX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS
D CLUSID STASET ENDSET MST MEN ADD
D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
E

the type lines following are assumed by dock unless overriden:

T  1 positive
T  2 negative
T  3 acceptor
T  4 donor
T  5 ester_o
T  6 amide_o
T  7 neutral

the following are the format statements for python for each line

T %2d %8s\n
M %16s %9s %3d %3d %6d %6d %6d %6d &6d %6d\n
M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
M %77s\n
M %77s\n
M %77s\n
A %3d %-4s %-5s %2d %2d %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
B %3d %3d %3d %-2s\n
X %9d %3d %6d %+9.4f %+9.4f %+9.4f\n
R %3d %2d %+9.4f %+9.4f %+9.4f\n
C %6d %9d %9d\n
S %6d %6d %3d %1d %1d %+11.3f\n
S %6d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n 
D %6d %6d %6d %3d %3d %3d\n
D %3d %2d %+9.4f %+9.4f %+9.4f\n
E\n

The following are the fortran77 format statements

!T ## namexxxx (implicitly assumed to be the standard 7)
1000 format(2x,i2,1x,a8)
!M zincname protname #atoms #bonds #xyz #groups #confs #sets #rigid #mlines #clusters
2000 format(2x,a16,1x,a9,1x,i3,1x,i3,1x,i6,1x,i6,1x,i6,x,i6,x,i6,x,i6,x,i6)
!M charge polar_solv apolar_solv total_solv surface_area
2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3)
!M smiles or longname
2200 format(2x,a77)
!A stuff about each atom, 1 per line
3000 format(2x,i3,1x,a4,1x,a5,1x,i2,1x,i2,1x,f9.4,1x,f10.3,1x,
    &       f10.3,1x,f10.3,1x,f9.3)
!B stuff about each bond, 1 per line
4000 format(2x,i3,1x,i3,1x,i3,1x,a2)
!X atomnum confnum x y z
5000 format(2x,i9,1x,i3,1x,i6,x,f9.4,1x,f9.4,1x,f9.4)
!R rigidnum color x y z
6000 format(2x,i3,x,i2,x,f9.4,1x,f9.4,1x,f9.4)
!C confnum #startcoord #endcoord
7000 format(2x,i6,1x,i9,1x,i9)
!S setnum #lines #confs_total broken hydrogens omega_energy
8000 format(2x,i6,1x,i6,1x,i3,1x,i1,1x,i1,1x,f11.3)
!S setnum linenum #confs confs [until full column]
8100 format(2x,i6,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6,
    &       1x,i6,1x,i6,1x,i6,1x,i6)
!D CLUSID STARTSETX ENDSETXXX ADD MST MEN
9000 format(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3)
!D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
!re-use 6000
!E
!E does not get a format line

The following are Fortran95 format statements:

!T ## namexxxx (implicitly assumed to be the standard 7)
      character (len=*), parameter :: DB2NAME = '(2x,i2,x,a8)' !1000
!M zincname protname #atoms #bonds #xyz #confs #sets #rigid #maxmlines #clusters
      character (len=*), parameter :: DB2M1 =
     &    '(2x,a16,x,a9,x,i3,x,i3,x,i6,x,i6,x,i6,x,i6,x,i6,x,i6)' !2000
!M charge polar_solv apolar_solv total_solv surface_area
      character (len=*), parameter :: DB2M2 =
     &    '(2x,f9.4,x,f10.3,x,f10.3,x,f10.3,x,f9.3)' !2100
!M smiles/longname/arbitrary
      character (len=*), parameter :: DB2M3 = '(2x,a78)' !2200
!A stuff about each atom, 1 per line
      character (len=*), parameter :: DB2ATOM =
     &    '(2x,i3,x,a4,x,a5,x,i2,x,i2,x,f9.4,x,f10.3,x,
     &    f10.3,x,f10.3,x,f9.3)' !3000
!B stuff about each bond, 1 per line
     character (len=*), parameter :: DB2BOND =
     &    '(2x,i3,x,i3,x,i3,x,a2)' !4000
!X coordnumx atomnum confnum x y z
      character (len=*), parameter :: DB2COORD =
     &    '(2x,i9,x,i3,x,i6,x,f9.4,x,f9.4,x,f9.4)' !5000
!R rigidnum color x y z
      character (len=*), parameter :: DB2RIGID =
     &    '(2x,i6,x,i2,x,f9.4,x,f9.4,x,f9.4)' !6000
!C confnum coordstart coordend
      character (len=*), parameter :: DB2CONF = '(2x,i6,x,i9,x,i9)' !7000
!S setnum #lines #confs_total broken hydrogens omega_energy 
      character (len=*), parameter :: DB2SET1 =
     &    '(2x,i6,x,i6,x,i3,x,i1,x,i1,x,f11.3)' !8000
!S setnum linenum #confs confs [until full column]
      character (len=*), parameter :: DB2SET2 =
     &    '(2x,i6,x,i6,x,i1,x,i6,x,i6,x,i6,x,i6,
     &    1x,i6,x,i6,x,i6,x,i6)' !8100
!D CLUSID STASET ENDSET ADD(ittional matching spheres count) MST(art) MEN(d)
      character (len=*), parameter :: DB2CLUSTER =
     &    '(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3)' !9000
!D NUM CO x y z
!reuse DB2RIGID
!E
!E does not get a format line