Mol2db2 Format 2: Difference between revisions

Revision as of 19:01, 20 April 2010

This page is a wishlist for features that would be nice for a new version of the flexibase file format to support.

Real Atom Types and Bond Information
Way to determine which mix-and-match conformations have clashes (and avoid trying them)
A place to store an internal energy for each possible conformation
Terminal hydrogen rotations??
Aliphatic ring movements?
support for clusters of conformations
group tagging (needed for covalent docking) and basic set of covalent groups
specified rigid component override (and better rules for finding non-ring rigid components)
per molecule pKa

the following represents the current plan for the file format

T type information (implicitly assumed)
M molecule (only 2 lines ever)
A atoms
B bond
X xyz
G group
D group-conf mapping
C conformation
S sets
E end of molecule

T ## namexxxx (implicitly assumed to be the standard 7)
M zincname protname #atoms #bonds #xyz #groups #confs #sets 
M charge polar_solv apolar_solv total_solv surface_area
A stuff about each atom, 1 per line 
B stuff about each bond, 1 per line
X atomnum confnum x y z 
G groupnum #lines #children_total
G groupnum linenum #children childgroup [until column is full]
D groupnum #lines #confs_total  
D groupnum linenum #confs confnum [until column is full]
C confnum #lines #children_total
C confnum linenum #children childconf [until column is full]
S setnum #lines #confs_total [INPUT|MIX] broken omega_energy
S setnum linenum #confs confs [until full column]
E

With the above descriptions, here is a description of the columns that are used. Format statements for python/fortran will also appear at some point. If speed/size becomes an issue this might get replaced with a binary file format.

notes: 17 children groups/group per line in current scheme. 9 children confs/group per line. 9 children confs/conf per line. 8 confs/set per line. groups/confs with no children are written out.

on the atom line, dt is dock type and co is color.

          1         2         3         4         5         6         7
01234567890123456789012345678901234567890123456789012345678901234567890123456789
T ## typename
M ZINCCODEX PROTCODEX ATO BON XYZXXX GRO CONFSX SETSXXXXX
M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
A NUM NAME TYPEX DT CO +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA
B NUM ATO ATO TY
X ATO CONFNU +XCO.ORDX +YCO.ORDX +ZCO.ORDX
G GRO #LI #CH
G GRO LIN #C CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN CGN
D GRO #LIN #CONFS
D GRO LINE # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS  
C CONFNO #LIN #CONFS
C CONFNO LINE # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS
S SETIDXXXX #LINES #CO I C +ENERGY.XXX
S SETIDXXXX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS
E

the type lines following are assumed by dock unless overriden:

T  1 positive
T  2 negative
T  3 acceptor
T  4 donor
T  5 ester_o
T  6 amide_o
T  7 neutral

the following are the format statements for python for each line

T %2d %8s\n
M %9s %9s %3d %3d %6d %3d %6d %9d\n
M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
A %3d %-4s %-5s %2d %2d %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n
B %3d %3d %3d %-2s\n
X %3d %6d %+9.4f %+9.4f %+9.4f\n
G %3d %3d %3d\n
G %3d %3d %2d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d %3d\n
D %3d %4d %6d\n
D %3d %4d %1d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n
C %6d %4d %6d\n
C %6d %4d %1d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n
S %6d %6d %3d %1d %1d %+11.3f\n
S %6d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n 
E\n

Mol2db2 Format 2: Difference between revisions

Revision as of 19:01, 20 April 2010

Navigation menu

Search