Mol2db2 Format 2: Difference between revisions
mNo edit summary |
No edit summary |
||
(70 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
This page is a wishlist for features that would be nice for a new version of the flexibase file format to support. | This page is a wishlist for features that would be nice for a new version of the flexibase file format to support. mol2db2 format features that are actually implemented so far are marked [x] | ||
*Real Atom Types and Bond Information | = New Features = | ||
*Way to determine which mix-and-match conformations have clashes (and avoid trying them) | == implemented == | ||
*A place to store an internal energy for each possible conformation | *Real Atom Types and Bond Information [x] | ||
*Terminal hydrogen rotations?? | *Way to determine which mix-and-match conformations have clashes (and avoid trying them) [x] | ||
*A place to store an internal energy for each possible conformation [x] | |||
*Terminal hydrogen rotations?? [x] | |||
*support for clusters of conformations [x] | |||
*arbitrary information to be written into output mol2 file (5th and above M lines) [x] | |||
== wished == | |||
*Per-conformation per-atom partial charge & solvation information to support internal energies | |||
*Aliphatic ring movements? | *Aliphatic ring movements? | ||
*group tagging (needed for covalent docking) and basic set of covalent groups | |||
*specified rigid component override (and better rules for finding non-ring rigid components) | |||
*per molecule pKa | |||
*valence for each atom | |||
== Nomenclature Definitions == | |||
* Conf - one set of atoms that moves together with a single position per atom. | |||
* Set - a group of conformations that completely defines one position for each atom in a ligand. | |||
* Cluster - Not yet implamented in DOCK3.7 | |||
* Cloud - Not yet implamented in DOCK3.7 | |||
= File Format = | |||
==current plan for the file format == | |||
*T type information (implicitly assumed) | |||
*M molecule (4 lines req'd, after that they are optional, 24 lines max) | |||
*A atoms | |||
*B bond | |||
*X xyz | |||
*R rigid xyz for matching (can actually be any xyzs) | |||
*C conformation | |||
*S sets | |||
*D clusters | |||
*E end of molecule | |||
T ## namexxxx (implicitly assumed to be the standard 7) | |||
M zincname protname #atoms #bonds #xyz #confs #sets #rigid #Mlines #clusters | |||
M charge polar_solv apolar_solv total_solv surface_area | |||
M smiles | |||
M longname | |||
[M arbitrary information preserved for writing out] | |||
A stuff about each atom, 1 per line | |||
B stuff about each bond, 1 per line | |||
X coordnum atomnum confnum x y z | |||
R rigidnum color x y z | |||
C confnum coordstart coordend | |||
S setnum #lines #confs_total broken hydrogens omega_energy | |||
S setnum linenum #confs confs [until full column] | |||
D clusternum setstart setend matchstart matchend #additionalmatching | |||
D matchnum color x y z | |||
E | |||
With the above descriptions, here is a description of the columns that are used. Format statements for python/fortran will also appear at some point. If speed/size becomes an issue this might get replaced with a binary file format. | |||
notes: 17 children groups/group per line in current scheme. | |||
9 children confs/group per line. | |||
9 children confs/conf per line. | |||
8 confs/set per line. | |||
groups/confs with no children are written out. | |||
on the atom line, dt is dock type and co is color. | |||
1 2 3 4 5 6 7 | |||
01234567890123456789012345678901234567890123456789012345678901234567890123456789 | |||
T ## typename | |||
M ZINCCODEXXXXXXXX PROTCODEX ATO BON XYZXXX CONFSX SETSXX RIGIDX MLINES NUMCLU | |||
M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA | |||
M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | |||
M LONGNAMEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | |||
[M ARBITRARY_INFORMATION_PRESERVEDXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] | |||
A NUM NAME TYPEX DT CO +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA | |||
B NUM ATO ATO TY | |||
X COORDNUMX ATO CONFNU +XCO.ORDX +YCO.ORDX +ZCO.ORDX | |||
R NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX | |||
C CONFNO COORDSTAR COORDENDX | |||
S SETIDX #LINES #CO C H +ENERGY.XXX | |||
S SETIDX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS | |||
D CLUSID STASET ENDSET MST MEN ADD | |||
D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX | |||
E | |||
the type lines following are assumed by dock unless overriden: | |||
T 1 positive | |||
T 2 negative | |||
T 3 acceptor | |||
T 4 donor | |||
T 5 ester_o | |||
T 6 amide_o | |||
T 7 neutral | |||
the following are the format statements for python for each line | |||
T %2d %8s\n | |||
M %16s %9s %3d %3d %6d %6d %6d %6d &6d %6d\n | |||
M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n | |||
M %77s\n | |||
M %77s\n | |||
M %77s\n | |||
A %3d %-4s %-5s %2d %2d %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n | |||
B %3d %3d %3d %-2s\n | |||
X %9d %3d %6d %+9.4f %+9.4f %+9.4f\n | |||
R %3d %2d %+9.4f %+9.4f %+9.4f\n | |||
C %6d %9d %9d\n | |||
S %6d %6d %3d %1d %1d %+11.3f\n | |||
S %6d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n | |||
D %6d %6d %6d %3d %3d %3d\n | |||
D %3d %2d %+9.4f %+9.4f %+9.4f\n | |||
E\n | |||
The following are the fortran77 format statements | |||
!T ## namexxxx (implicitly assumed to be the standard 7) | |||
1000 format(2x,i2,1x,a8) | |||
!M zincname protname #atoms #bonds #xyz #groups #confs #sets #rigid #mlines #clusters | |||
2000 format(2x,a16,1x,a9,1x,i3,1x,i3,1x,i6,1x,i6,1x,i6,x,i6,x,i6,x,i6,x,i6) | |||
!M charge polar_solv apolar_solv total_solv surface_area | |||
2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3) | |||
!M smiles or longname | |||
2200 format(2x,a77) | |||
!A stuff about each atom, 1 per line | |||
3000 format(2x,i3,1x,a4,1x,a5,1x,i2,1x,i2,1x,f9.4,1x,f10.3,1x, | |||
& f10.3,1x,f10.3,1x,f9.3) | |||
!B stuff about each bond, 1 per line | |||
4000 format(2x,i3,1x,i3,1x,i3,1x,a2) | |||
!X atomnum confnum x y z | |||
5000 format(2x,i9,1x,i3,1x,i6,x,f9.4,1x,f9.4,1x,f9.4) | |||
!R rigidnum color x y z | |||
6000 format(2x,i3,x,i2,x,f9.4,1x,f9.4,1x,f9.4) | |||
!C confnum #startcoord #endcoord | |||
7000 format(2x,i6,1x,i9,1x,i9) | |||
!S setnum #lines #confs_total broken hydrogens omega_energy | |||
8000 format(2x,i6,1x,i6,1x,i3,1x,i1,1x,i1,1x,f11.3) | |||
!S setnum linenum #confs confs [until full column] | |||
8100 format(2x,i6,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6, | |||
& 1x,i6,1x,i6,1x,i6,1x,i6) | |||
!D CLUSID STARTSETX ENDSETXXX ADD MST MEN | |||
9000 format(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3) | |||
!D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX | |||
!re-use 6000 | |||
!E | |||
!E does not get a format line | |||
The following are Fortran95 format statements: | |||
!T ## namexxxx (implicitly assumed to be the standard 7) | |||
character (len=*), parameter :: DB2NAME = '(2x,i2,x,a8)' !1000 | |||
!M zincname protname #atoms #bonds #xyz #confs #sets #rigid #maxmlines #clusters | |||
character (len=*), parameter :: DB2M1 = | |||
& '(2x,a16,x,a9,x,i3,x,i3,x,i6,x,i6,x,i6,x,i6,x,i6,x,i6)' !2000 | |||
!M charge polar_solv apolar_solv total_solv surface_area | |||
character (len=*), parameter :: DB2M2 = | |||
& '(2x,f9.4,x,f10.3,x,f10.3,x,f10.3,x,f9.3)' !2100 | |||
!M smiles/longname/arbitrary | |||
character (len=*), parameter :: DB2M3 = '(2x,a78)' !2200 | |||
!A stuff about each atom, 1 per line | |||
character (len=*), parameter :: DB2ATOM = | |||
& '(2x,i3,x,a4,x,a5,x,i2,x,i2,x,f9.4,x,f10.3,x, | |||
& f10.3,x,f10.3,x,f9.3)' !3000 | |||
!B stuff about each bond, 1 per line | |||
character (len=*), parameter :: DB2BOND = | |||
& '(2x,i3,x,i3,x,i3,x,a2)' !4000 | |||
!X coordnumx atomnum confnum x y z | |||
character (len=*), parameter :: DB2COORD = | |||
& '(2x,i9,x,i3,x,i6,x,f9.4,x,f9.4,x,f9.4)' !5000 | |||
!R rigidnum color x y z | |||
character (len=*), parameter :: DB2RIGID = | |||
& '(2x,i6,x,i2,x,f9.4,x,f9.4,x,f9.4)' !6000 | |||
!C confnum coordstart coordend | |||
character (len=*), parameter :: DB2CONF = '(2x,i6,x,i9,x,i9)' !7000 | |||
!S setnum #lines #confs_total broken hydrogens omega_energy | |||
character (len=*), parameter :: DB2SET1 = | |||
& '(2x,i6,x,i6,x,i3,x,i1,x,i1,x,f11.3)' !8000 | |||
!S setnum linenum #confs confs [until full column] | |||
character (len=*), parameter :: DB2SET2 = | |||
& '(2x,i6,x,i6,x,i1,x,i6,x,i6,x,i6,x,i6, | |||
& 1x,i6,x,i6,x,i6,x,i6)' !8100 | |||
!D CLUSID STASET ENDSET ADD(ittional matching spheres count) MST(art) MEN(d) | |||
character (len=*), parameter :: DB2CLUSTER = | |||
& '(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3)' !9000 | |||
!D NUM CO x y z | |||
!reuse DB2RIGID | |||
!E | |||
!E does not get a format line | |||
[[Category: | [[Category:Formats]] |
Latest revision as of 15:44, 23 October 2014
This page is a wishlist for features that would be nice for a new version of the flexibase file format to support. mol2db2 format features that are actually implemented so far are marked [x]
New Features
implemented
- Real Atom Types and Bond Information [x]
- Way to determine which mix-and-match conformations have clashes (and avoid trying them) [x]
- A place to store an internal energy for each possible conformation [x]
- Terminal hydrogen rotations?? [x]
- support for clusters of conformations [x]
- arbitrary information to be written into output mol2 file (5th and above M lines) [x]
wished
- Per-conformation per-atom partial charge & solvation information to support internal energies
- Aliphatic ring movements?
- group tagging (needed for covalent docking) and basic set of covalent groups
- specified rigid component override (and better rules for finding non-ring rigid components)
- per molecule pKa
- valence for each atom
Nomenclature Definitions
- Conf - one set of atoms that moves together with a single position per atom.
- Set - a group of conformations that completely defines one position for each atom in a ligand.
- Cluster - Not yet implamented in DOCK3.7
- Cloud - Not yet implamented in DOCK3.7
File Format
current plan for the file format
- T type information (implicitly assumed)
- M molecule (4 lines req'd, after that they are optional, 24 lines max)
- A atoms
- B bond
- X xyz
- R rigid xyz for matching (can actually be any xyzs)
- C conformation
- S sets
- D clusters
- E end of molecule
T ## namexxxx (implicitly assumed to be the standard 7) M zincname protname #atoms #bonds #xyz #confs #sets #rigid #Mlines #clusters M charge polar_solv apolar_solv total_solv surface_area M smiles M longname [M arbitrary information preserved for writing out] A stuff about each atom, 1 per line B stuff about each bond, 1 per line X coordnum atomnum confnum x y z R rigidnum color x y z C confnum coordstart coordend S setnum #lines #confs_total broken hydrogens omega_energy S setnum linenum #confs confs [until full column] D clusternum setstart setend matchstart matchend #additionalmatching D matchnum color x y z E
With the above descriptions, here is a description of the columns that are used. Format statements for python/fortran will also appear at some point. If speed/size becomes an issue this might get replaced with a binary file format.
notes: 17 children groups/group per line in current scheme. 9 children confs/group per line. 9 children confs/conf per line. 8 confs/set per line. groups/confs with no children are written out.
on the atom line, dt is dock type and co is color.
1 2 3 4 5 6 7 01234567890123456789012345678901234567890123456789012345678901234567890123456789 T ## typename M ZINCCODEXXXXXXXX PROTCODEX ATO BON XYZXXX CONFSX SETSXX RIGIDX MLINES NUMCLU M +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA M SMILESXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX M LONGNAMEXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX [M ARBITRARY_INFORMATION_PRESERVEDXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] A NUM NAME TYPEX DT CO +CHA.RGEX +POLAR.SOL +APOLA.SOL +TOTAL.SOL SURFA.REA B NUM ATO ATO TY X COORDNUMX ATO CONFNU +XCO.ORDX +YCO.ORDX +ZCO.ORDX R NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX C CONFNO COORDSTAR COORDENDX S SETIDX #LINES #CO C H +ENERGY.XXX S SETIDX LINENO # CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS CCONFS D CLUSID STASET ENDSET MST MEN ADD D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX E
the type lines following are assumed by dock unless overriden:
T 1 positive T 2 negative T 3 acceptor T 4 donor T 5 ester_o T 6 amide_o T 7 neutral
the following are the format statements for python for each line
T %2d %8s\n M %16s %9s %3d %3d %6d %6d %6d %6d &6d %6d\n M %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n M %77s\n M %77s\n M %77s\n A %3d %-4s %-5s %2d %2d %+9.4f %+10.3f %+10.3f %+10.3f %9.3f\n B %3d %3d %3d %-2s\n X %9d %3d %6d %+9.4f %+9.4f %+9.4f\n R %3d %2d %+9.4f %+9.4f %+9.4f\n C %6d %9d %9d\n S %6d %6d %3d %1d %1d %+11.3f\n S %6d %6d %1d %6d %6d %6d %6d %6d %6d %6d %6d\n D %6d %6d %6d %3d %3d %3d\n D %3d %2d %+9.4f %+9.4f %+9.4f\n E\n
The following are the fortran77 format statements
!T ## namexxxx (implicitly assumed to be the standard 7) 1000 format(2x,i2,1x,a8) !M zincname protname #atoms #bonds #xyz #groups #confs #sets #rigid #mlines #clusters 2000 format(2x,a16,1x,a9,1x,i3,1x,i3,1x,i6,1x,i6,1x,i6,x,i6,x,i6,x,i6,x,i6) !M charge polar_solv apolar_solv total_solv surface_area 2100 format(2x,f9.4,1x,f10.3,1x,f10.3,1x,f10.3,1x,f9.3) !M smiles or longname 2200 format(2x,a77) !A stuff about each atom, 1 per line 3000 format(2x,i3,1x,a4,1x,a5,1x,i2,1x,i2,1x,f9.4,1x,f10.3,1x, & f10.3,1x,f10.3,1x,f9.3) !B stuff about each bond, 1 per line 4000 format(2x,i3,1x,i3,1x,i3,1x,a2) !X atomnum confnum x y z 5000 format(2x,i9,1x,i3,1x,i6,x,f9.4,1x,f9.4,1x,f9.4) !R rigidnum color x y z 6000 format(2x,i3,x,i2,x,f9.4,1x,f9.4,1x,f9.4) !C confnum #startcoord #endcoord 7000 format(2x,i6,1x,i9,1x,i9) !S setnum #lines #confs_total broken hydrogens omega_energy 8000 format(2x,i6,1x,i6,1x,i3,1x,i1,1x,i1,1x,f11.3) !S setnum linenum #confs confs [until full column] 8100 format(2x,i6,1x,i6,1x,i1,1x,i6,1x,i6,1x,i6,1x,i6, & 1x,i6,1x,i6,1x,i6,1x,i6) !D CLUSID STARTSETX ENDSETXXX ADD MST MEN 9000 format(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3) !D NUM CO +XCO.ORDX +YCO.ORDX +ZCO.ORDX !re-use 6000 !E !E does not get a format line
The following are Fortran95 format statements:
!T ## namexxxx (implicitly assumed to be the standard 7) character (len=*), parameter :: DB2NAME = '(2x,i2,x,a8)' !1000 !M zincname protname #atoms #bonds #xyz #confs #sets #rigid #maxmlines #clusters character (len=*), parameter :: DB2M1 = & '(2x,a16,x,a9,x,i3,x,i3,x,i6,x,i6,x,i6,x,i6,x,i6,x,i6)' !2000 !M charge polar_solv apolar_solv total_solv surface_area character (len=*), parameter :: DB2M2 = & '(2x,f9.4,x,f10.3,x,f10.3,x,f10.3,x,f9.3)' !2100 !M smiles/longname/arbitrary character (len=*), parameter :: DB2M3 = '(2x,a78)' !2200 !A stuff about each atom, 1 per line character (len=*), parameter :: DB2ATOM = & '(2x,i3,x,a4,x,a5,x,i2,x,i2,x,f9.4,x,f10.3,x, & f10.3,x,f10.3,x,f9.3)' !3000 !B stuff about each bond, 1 per line character (len=*), parameter :: DB2BOND = & '(2x,i3,x,i3,x,i3,x,a2)' !4000 !X coordnumx atomnum confnum x y z character (len=*), parameter :: DB2COORD = & '(2x,i9,x,i3,x,i6,x,f9.4,x,f9.4,x,f9.4)' !5000 !R rigidnum color x y z character (len=*), parameter :: DB2RIGID = & '(2x,i6,x,i2,x,f9.4,x,f9.4,x,f9.4)' !6000 !C confnum coordstart coordend character (len=*), parameter :: DB2CONF = '(2x,i6,x,i9,x,i9)' !7000 !S setnum #lines #confs_total broken hydrogens omega_energy character (len=*), parameter :: DB2SET1 = & '(2x,i6,x,i6,x,i3,x,i1,x,i1,x,f11.3)' !8000 !S setnum linenum #confs confs [until full column] character (len=*), parameter :: DB2SET2 = & '(2x,i6,x,i6,x,i1,x,i6,x,i6,x,i6,x,i6, & 1x,i6,x,i6,x,i6,x,i6)' !8100 !D CLUSID STASET ENDSET ADD(ittional matching spheres count) MST(art) MEN(d) character (len=*), parameter :: DB2CLUSTER = & '(2x,i6,x,i6,x,i6,x,i3,x,i3,x,i3)' !9000 !D NUM CO x y z !reuse DB2RIGID !E !E does not get a format line