Mol2db2 Format 2: Difference between revisions
m (stupid newline) |
(adding hydrogen set data) |
||
Line 37: | Line 37: | ||
C confnum #lines #children_total coordstart coordend | C confnum #lines #children_total coordstart coordend | ||
C confnum linenum #children childconf [until column is full] | C confnum linenum #children childconf [until column is full] | ||
S setnum #lines #confs_total [INPUT|MIX] broken omega_energy | S setnum #lines #confs_total [INPUT|MIX] broken hydrogens omega_energy | ||
S setnum linenum #confs confs [until full column] | S setnum linenum #confs confs [until full column] | ||
E | E | ||
Line 67: | Line 67: | ||
C CONFNO #LIN #CONFS COORDSTAR COORDENDX | C CONFNO #LIN #CONFS COORDSTAR COORDENDX | ||
C CONFNO LINE # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS | C CONFNO LINE # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS | ||
S SETIDXXXX #LINES #CO I C +ENERGY.XXX | S SETIDXXXX #LINES #CO I C H +ENERGY.XXX | ||
S SETIDXXXX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS | S SETIDXXXX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS | ||
E | E | ||
Line 95: | Line 95: | ||
C %6d %4d %6d %9d %9d\n | C %6d %4d %6d %9d %9d\n | ||
C %6d %4d %1d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n | C %6d %4d %1d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n | ||
S %9d %6d %3d %1d %1d %+11.3f\n | S %9d %6d %3d %1d %1d %1d %+11.3f\n | ||
S %9d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n | S %9d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n | ||
E\n | E\n | ||
Line 133: | Line 133: | ||
& 1x,i6,1x,i6,1x,i6,1x,i6) | & 1x,i6,1x,i6,1x,i6,1x,i6) | ||
!S setnum #lines #confs_total [INPUT|MIX] broken omega_energy | !S setnum #lines #confs_total [INPUT|MIX] broken omega_energy | ||
9000 format(2x,i9,1x,i6,1x,i3,1x,i1,1x,i1,f11.3) | 9000 format(2x,i9,1x,i6,1x,i3,1x,i1,1x,i1,1x,i1,1x,f11.3) | ||
!S setnum linenum #confs confs [until full column] | !S setnum linenum #confs confs [until full column] | ||
9100 format(2x,i9,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6, | 9100 format(2x,i9,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6, |
Revision as of 21:22, 3 September 2010
This page is a wishlist for features that would be nice for a new version of the flexibase file format to support.
- Real Atom Types and Bond Information
- Way to determine which mix-and-match conformations have clashes (and avoid trying them)
- A place to store an internal energy for each possible conformation
- Terminal hydrogen rotations??
- Aliphatic ring movements?
- support for clusters of conformations
- group tagging (needed for covalent docking) and basic set of covalent groups
- specified rigid component override (and better rules for finding non-ring rigid components)
- per molecule pKa
the following represents the current plan for the file format
- T type information (implicitly assumed)
- M molecule (only 4 lines ever)
- A atoms
- B bond
- X xyz
- G group
- D group-conf mapping
- C conformation
- S sets
- E end of molecule
T ## namexxxx (implicitly assumed to be the standard 7) M zincname protname #atoms #bonds #xyz #groups #confs #sets M charge polar_solv apolar_solv total_solv surface_area M smiles M longname A stuff about each atom, 1 per line B stuff about each bond, 1 per line X coordnum atomnum confnum x y z G groupnum #lines #children_total G groupnum linenum #children childgroup [until column is full] D groupnum #lines #confs_total D groupnum linenum #confs confnum [until column is full] C confnum #lines #children_total coordstart coordend C confnum linenum #children childconf [until column is full] S setnum #lines #confs_total [INPUT|MIX] broken hydrogens omega_energy S setnum linenum #confs confs [until full column] E
With the above descriptions, here is a description of the columns that are used. Format statements for python/fortran will also appear at some point. If speed/size becomes an issue this might get replaced with a binary file format.
notes: 17 children groups/group per line in current scheme. 9 children confs/group per line. 9 children confs/conf per line. 8 confs/set per line. groups/confs with no children are written out.
on the atom line, dt is dock type and co is color.
1 2 3 4 5 6 7 01234567890123456789012345678901234567890123456789012345678901234567890123456789 T ## typename M ZINCCODEX PROTCODEX ATO BON XYZXXX GRO CONFSX SETSXXXXX M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX M LONGNAMEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX A NUM NAME TYPEX DT CO +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA B NUM ATO ATO TY X COORDNUMX ATO CONFNU +XCO.ORDX +YCO.ORDX +ZCO.ORDX G GRO #LI #CH G GRO LIN #C CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN D GRO #LIN #CONFS D GRO LINE # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS C CONFNO #LIN #CONFS COORDSTAR COORDENDX C CONFNO LINE # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS S SETIDXXXX #LINES #CO I C H +ENERGY.XXX S SETIDXXXX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS E
the type lines following are assumed by dock unless overriden:
T 1 positive T 2 negative T 3 acceptor T 4 donor T 5 ester_o T 6 amide_o T 7 neutral
the following are the format statements for python for each line
T %2d %8s\n M %9s %9s %3d %3d %6d %3d %6d %9d\n M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n M %77s\n M %77s\n A %3d %-4s %-5s %2d %2d %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n B %3d %3d %3d %-2s\n X %9d %3d %6d %+9.4f %+9.4f %+9.4f\n G %3d %3d %3d\n G %3d %3d %2d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d\n D %3d %4d %6d\n D %3d %4d %1d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n C %6d %4d %6d %9d %9d\n C %6d %4d %1d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n S %9d %6d %3d %1d %1d %1d %+11.3f\n S %9d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n E\n
The following are the fortran format statements
!T ## namexxxx (implicitly assumed to be the standard 7) 1000 format(2x,i2,1x,a8) !M zincname protname #atoms #bonds #xyz #groups #confs #sets 2000 format(2x,a9,1x,a9,1x,i3,1x,i3,1x,i6,1x,i3,1x,i6,1x,i9) !M charge polar_solv apolar_solv total_solv surface_area 2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3) !M smiles or longname 2200 format(2x,a77) !A stuff about each atom, 1 per line 3000 format(2x,i3,1x,a4,1x,a5,1x,i2,1x,i2,1x,f9.4,1x,f10.3,1x, & f10.3,1x,f10.3,1x,f9.3) !B stuff about each bond, 1 per line 4000 format(2x,i3,1x,i3,1x,i3,1x,a2) !X atomnum confnum x y z 5000 format(2x,i9,1x,i3,1x,i6,f9.4,1x,f9.4,1x,f9.4) !G groupnum #lines #children_total 6000 format(2x,i3,1x,i3,1x,i3) !G groupnum linenum #children childgroup [until column is full] 6100 format(2x,i3,1x,i3,1x,i2,1x,i3,1x,i3,1x,i3,1x,i3,1x,i3, & 1x,i3,1x,i3,1x,i3,1x,i3,1x,i3,1x,i3,1x,i3, & 1x,i3,1x,i3,1x,i3,1x,i3,1x,i3,1x,i3,1x,i3) !D groupnum #lines #confs_total 7000 format(2x,i3,1x,i4,1x,i6) !D groupnum linenum #confs confnum [until column is full] 7100 format(2x,i3,1x,i4,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6,1x,i6, & 1x,i6,1x,i6,1x,i6,1x,i6) !C confnum #lines #children_total 8000 format(2x,i6,1x,i4,1x,i6,1x,i9,1x,i9) !C confnum linenum #children childconf [until column is full] 8100 format(2x,i6,1x,i4,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6,1x,i6, & 1x,i6,1x,i6,1x,i6,1x,i6) !S setnum #lines #confs_total [INPUT|MIX] broken omega_energy 9000 format(2x,i9,1x,i6,1x,i3,1x,i1,1x,i1,1x,i1,1x,f11.3) !S setnum linenum #confs confs [until full column] 9100 format(2x,i9,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6, & 1x,i6,1x,i6,1x,i6,1x,i6) !E !E does not get a format line