Mol2db2 Format 2: Difference between revisions

From DISI
Jump to navigation Jump to search
No edit summary
m (deleting ridiculously long set ids)
Line 25: Line 25:


  T ## namexxxx (implicitly assumed to be the standard 7)
  T ## namexxxx (implicitly assumed to be the standard 7)
  M zincname protname #atoms #bonds #xyz #confs #sets #rigid #Mlines
  M zincname protname #atoms #bonds #xyz #confs #sets #rigid #Mlines #clusters
  M charge polar_solv apolar_solv total_solv surface_area
  M charge polar_solv apolar_solv total_solv surface_area
  M smiles
  M smiles
Line 54: Line 54:
  01234567890123456789012345678901234567890123456789012345678901234567890123456789
  01234567890123456789012345678901234567890123456789012345678901234567890123456789
  T ## typename
  T ## typename
  M ZINCCODEXXXXXXXX PROTCODEX ATO BON XYZXXX CONFSX SETSXXXXX RIGIDX MLINES
  M ZINCCODEXXXXXXXX PROTCODEX ATO BON XYZXXX CONFSX SETSXX RIGIDX MLINES NUMCLU
  M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
  M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
  M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Line 64: Line 64:
  R NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
  R NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
  C CONFNO COORDSTAR COORDENDX
  C CONFNO COORDSTAR COORDENDX
  S SETIDXXXX #LINES #CO C H +ENERGY.XXX
  S SETIDX #LINES #CO C H +ENERGY.XXX
  S SETIDXXXX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS
  S SETIDX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS
  D CLUSTERID STARTSETX ENDSETXXX ADD
  D CLUSID STASET ENDSET ADD
  D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
  D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
  E
  E
Line 81: Line 81:
the following are the format statements for python for each line
the following are the format statements for python for each line
  T %2d %8s\n
  T %2d %8s\n
  M %16s %9s %3d %3d %6d %6d %9d %6d &6d\n
  M %16s %9s %3d %3d %6d %6d %6d %6d &6d %6d\n
  M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
  M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
  M %77s\n
  M %77s\n
Line 91: Line 91:
  R %3d %2d %+9.4f %+9.4f %+9.4f\n
  R %3d %2d %+9.4f %+9.4f %+9.4f\n
  C %6d %9d %9d\n
  C %6d %9d %9d\n
  S %9d %6d %3d %1d %1d %+11.3f\n
  S %6d %6d %3d %1d %1d %+11.3f\n
  S %9d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n  
  S %6d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n  
  D %9d %9d %9d %3d\n
  D %6d %6d %6d %3d\n
  D %3d %2d %+9.4f %+9.4f %+9.4f\n
  D %3d %2d %+9.4f %+9.4f %+9.4f\n
  E\n
  E\n
Line 102: Line 102:
  1000 format(2x,i2,1x,a8)
  1000 format(2x,i2,1x,a8)
  !M zincname protname #atoms #bonds #xyz #groups #confs #sets #rigid
  !M zincname protname #atoms #bonds #xyz #groups #confs #sets #rigid
  2000 format(2x,a16,1x,a9,1x,i3,1x,i3,1x,i6,1x,i6,1x,i9,x,i6)
  2000 format(2x,a16,1x,a9,1x,i3,1x,i3,1x,i6,1x,i6,1x,i6,x,i6,x,i6)
  !M charge polar_solv apolar_solv total_solv surface_area
  !M charge polar_solv apolar_solv total_solv surface_area
  2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3)
  2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3)
Line 119: Line 119:
  6000 format(2x,i6,1x,i9,1x,i9)
  6000 format(2x,i6,1x,i9,1x,i9)
  !S setnum #lines #confs_total broken hydrogens omega_energy
  !S setnum #lines #confs_total broken hydrogens omega_energy
  7000 format(2x,i9,1x,i6,1x,i3,1x,i1,1x,i1,1x,f11.3)
  7000 format(2x,i6,1x,i6,1x,i3,1x,i1,1x,i1,1x,f11.3)
  !S setnum linenum #confs confs [until full column]
  !S setnum linenum #confs confs [until full column]
  7100 format(2x,i9,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6,
  7100 format(2x,i6,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6,
     &      1x,i6,1x,i6,1x,i6,1x,i6)
     &      1x,i6,1x,i6,1x,i6,1x,i6)
  !D CLUSTERID STARTSETX ENDSETXXX ADD
  !D CLUSID STARTSETX ENDSETXXX ADD
  8000 format(2x,i9,x,i9,x,i9,x,i3)
  8000 format(2x,i6,x,i9,x,i9,x,i3)
  !D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
  !D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
  !re-use 5500
  !re-use 5500

Revision as of 22:49, 23 May 2011

This page is a wishlist for features that would be nice for a new version of the flexibase file format to support. mol2db2 format features that are actually implemented so far are marked [x]

  • Real Atom Types and Bond Information [x]
  • Way to determine which mix-and-match conformations have clashes (and avoid trying them) [x]
  • A place to store an internal energy for each possible conformation [x]
  • Terminal hydrogen rotations?? [x]
  • Aliphatic ring movements?
  • support for clusters of conformations
  • group tagging (needed for covalent docking) and basic set of covalent groups
  • specified rigid component override (and better rules for finding non-ring rigid components)
  • per molecule pKa
  • arbitrary information to be written into output mol2 file (5th and above M lines) [x]

the following represents the current plan for the file format

  • T type information (implicitly assumed)
  • M molecule (4 lines req'd, after that they are optional, 24 lines max)
  • A atoms
  • B bond
  • X xyz
  • R rigid xyz for matching (can actually be any xyzs)
  • C conformation
  • S sets
  • D clusters
  • E end of molecule
T ## namexxxx (implicitly assumed to be the standard 7)
M zincname protname #atoms #bonds #xyz #confs #sets #rigid #Mlines #clusters
M charge polar_solv apolar_solv total_solv surface_area
M smiles
M longname
[M arbitrary information preserved for writing out]
A stuff about each atom, 1 per line 
B stuff about each bond, 1 per line
X coordnum atomnum confnum x y z 
R rigidnum color x y z
C confnum coordstart coordend
S setnum #lines #confs_total broken hydrogens omega_energy
S setnum linenum #confs confs [until full column]
D clusternum setstart setend #additionalmatching
D matchnum color x y z
E 

With the above descriptions, here is a description of the columns that are used. Format statements for python/fortran will also appear at some point. If speed/size becomes an issue this might get replaced with a binary file format.

notes: 17 children groups/group per line in current scheme. 9 children confs/group per line. 9 children confs/conf per line. 8 confs/set per line. groups/confs with no children are written out.

on the atom line, dt is dock type and co is color.

          1         2         3         4         5         6         7
01234567890123456789012345678901234567890123456789012345678901234567890123456789
T ## typename
M ZINCCODEXXXXXXXX PROTCODEX ATO BON XYZXXX CONFSX SETSXX RIGIDX MLINES NUMCLU
M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
M LONGNAMEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[M ARBITRARY_INFORMATION_PRESERVEDXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX]
A NUM NAME TYPEX DT CO +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
B NUM ATO ATO TY
X COORDNUMX ATO CONFNU +XCO.ORDX +YCO.ORDX +ZCO.ORDX
R NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
C CONFNO COORDSTAR COORDENDX
S SETIDX #LINES #CO C H +ENERGY.XXX
S SETIDX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS
D CLUSID STASET ENDSET ADD
D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
E

the type lines following are assumed by dock unless overriden:

T  1 positive
T  2 negative
T  3 acceptor
T  4 donor
T  5 ester_o
T  6 amide_o
T  7 neutral

the following are the format statements for python for each line

T %2d %8s\n
M %16s %9s %3d %3d %6d %6d %6d %6d &6d %6d\n
M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
M %77s\n
M %77s\n
M %77s\n
A %3d %-4s %-5s %2d %2d %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
B %3d %3d %3d %-2s\n
X %9d %3d %6d %+9.4f %+9.4f %+9.4f\n
R %3d %2d %+9.4f %+9.4f %+9.4f\n
C %6d %9d %9d\n
S %6d %6d %3d %1d %1d %+11.3f\n
S %6d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n 
D %6d %6d %6d %3d\n
D %3d %2d %+9.4f %+9.4f %+9.4f\n
E\n

The following are the fortran format statements

!T ## namexxxx (implicitly assumed to be the standard 7)
1000 format(2x,i2,1x,a8)
!M zincname protname #atoms #bonds #xyz #groups #confs #sets #rigid
2000 format(2x,a16,1x,a9,1x,i3,1x,i3,1x,i6,1x,i6,1x,i6,x,i6,x,i6)
!M charge polar_solv apolar_solv total_solv surface_area
2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3)
!M smiles or longname
2200 format(2x,a77)
!A stuff about each atom, 1 per line
3000 format(2x,i3,1x,a4,1x,a5,1x,i2,1x,i2,1x,f9.4,1x,f10.3,1x,
    &       f10.3,1x,f10.3,1x,f9.3)
!B stuff about each bond, 1 per line
4000 format(2x,i3,1x,i3,1x,i3,1x,a2)
!X atomnum confnum x y z
5000 format(2x,i9,1x,i3,1x,i6,x,f9.4,1x,f9.4,1x,f9.4)
!R rigidnum color x y z
5500 format(2x,i3,x,i2,x,f9.4,1x,f9.4,1x,f9.4)
!C confnum #startcoord #endcoord
6000 format(2x,i6,1x,i9,1x,i9)
!S setnum #lines #confs_total broken hydrogens omega_energy
7000 format(2x,i6,1x,i6,1x,i3,1x,i1,1x,i1,1x,f11.3)
!S setnum linenum #confs confs [until full column]
7100 format(2x,i6,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6,
    &       1x,i6,1x,i6,1x,i6,1x,i6)
!D CLUSID STARTSETX ENDSETXXX ADD
8000 format(2x,i6,x,i9,x,i9,x,i3)
!D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX
!re-use 5500
!E
!E does not get a format line