http://wiki.docking.org/api.php?action=feedcontributions&user=Btingle&feedformat=atomDISI - User contributions [en]2024-03-29T07:09:20ZUser contributionsMediaWiki 1.39.1http://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15458All About DB2 Files2023-07-11T19:05:03Z<p>Btingle: </p>
<hr />
<div>The terms "rotomer", "conformation" and "set" may be used interchangeably in this text- unless "set" refers to the usual meaning of the word i.e a collection of objects.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no atom coordinate is within a threshold distance (typically 0.001A) of any other coordinate for that same atom in the set of all coordinates<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
=== What is the deal with sets and confs? ===<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation. "Set" is an annoyingly general term for something that is so specific. These are the terms we are stuck with though! If I could turn back time, I would probably rename set->conformation and conf->subconformation<br />
<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before (as of 06/13/2023). Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15447All About DB2 Files2023-06-14T00:33:41Z<p>Btingle: /* What is the deal with sets and confs? */</p>
<hr />
<div>The terms "rotomer", "conformation" and "set" may be used interchangeably in this text- unless "set" refers to the usual meaning of the word i.e a collection of objects.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set of all coordinates<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
=== What is the deal with sets and confs? ===<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation. "Set" is an annoyingly general term for something that is so specific. These are the terms we are stuck with though! If I could turn back time, I would probably rename set->conformation and conf->subconformation<br />
<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before (as of 06/13/2023). Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15446All About DB2 Files2023-06-14T00:32:13Z<p>Btingle: /* The nittier grittier */</p>
<hr />
<div>The terms "rotomer", "conformation" and "set" may be used interchangeably in this text- unless "set" refers to the usual meaning of the word i.e a collection of objects.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set of all coordinates<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation. "Set" is an annoyingly general term for something that is so specific. These are the terms we are stuck with though! If I could turn back time, I would probably rename set->conformation and conf->subconformation<br />
<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before (as of 06/13/2023). Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15445All About DB2 Files2023-06-14T00:31:32Z<p>Btingle: /* What is the deal with sets and confs? */</p>
<hr />
<div>The terms "rotomer", "conformation" and "set" may be used interchangeably in this text- unless "set" refers to the usual meaning of the word i.e a collection of objects.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set of all coordinates<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation. "Set" is an annoyingly general term for something that is so specific. These are the terms we are stuck with though! If I could turn back time, I would probably rename set->conformation and conf->subconformation<br />
<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15444All About DB2 Files2023-06-14T00:28:32Z<p>Btingle: /* The nitty gritty */</p>
<hr />
<div>The terms "rotomer", "conformation" and "set" may be used interchangeably in this text- unless "set" refers to the usual meaning of the word i.e a collection of objects.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set of all coordinates<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation. "Set" is an annoyingly general term for something that is so specific. These are the terms we are stuck with though!<br />
<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15443All About DB2 Files2023-06-14T00:26:19Z<p>Btingle: /* What is the deal with sets and confs? */</p>
<hr />
<div>The terms "rotomer", "conformation" and "set" may be used interchangeably in this text- unless "set" refers to the usual meaning of the word i.e a collection of objects.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation. "Set" is an annoyingly general term for something that is so specific. These are the terms we are stuck with though!<br />
<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15442All About DB2 Files2023-06-14T00:25:20Z<p>Btingle: </p>
<hr />
<div>The terms "rotomer", "conformation" and "set" may be used interchangeably in this text- unless "set" refers to the usual meaning of the word i.e a collection of objects.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15441All About DB2 Files2023-06-14T00:20:14Z<p>Btingle: </p>
<hr />
<div>== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15440All About DB2 Files2023-06-14T00:19:20Z<p>Btingle: </p>
<hr />
<div>The terms "set", "conformation", and "rotomer" will be used interchangeably in this text.<br />
<br />
== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15439All About DB2 Files2023-06-14T00:15:15Z<p>Btingle: /* What is the deal with sets and confs? */</p>
<hr />
<div>== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap.<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common.<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15438All About DB2 Files2023-06-14T00:14:27Z<p>Btingle: /* The nitty gritty */</p>
<hr />
<div>== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense).<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file.<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15437All About DB2 Files2023-06-14T00:14:13Z<p>Btingle: /* The brass tacks */</p>
<hr />
<div>== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles.<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies.<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense)<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15436All About DB2 Files2023-06-14T00:13:45Z<p>Btingle: /* The nitty gritty */</p>
<hr />
<div>== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense)<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=All_About_DB2_Files&diff=15435All About DB2 Files2023-06-14T00:13:23Z<p>Btingle: Created page with "== The brass tacks == DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations. DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies == The nitty gritty == DB2 is a text fo..."</p>
<hr />
<div>== The brass tacks ==<br />
<br />
DB2 files encode & compress 3D molecule rotomers, sometimes referred to as conformations.<br />
DB2 files do not store the dihedral angles of rotomers, but instead the per-atom coordinates resulting from those angles<br />
Atoms of different rotomers whose coordinates overlap are merged together in db2 files (how this is done will be detailed more later), thus allowing DOCK to avoid double-calculating per-atom energies<br />
<br />
== The nitty gritty ==<br />
<br />
DB2 is a text format, with each row of information being printed on a new line (not to exceed 80 characters per line, due to old fortran nonsense)<br />
Mutiple DB2 entries can be stored in a db2 file- there is no limit on the number of entries that can be fit in a single db2 file<br />
Thus concatenating any number of db2 files together still results in a valid db2 file.<br />
<br />
There are seven(/eight) species of lines in a db2 entry- they will be listed in the order they appear.<br />
<br />
"M" lines serve a few functions. First is to define information about a db2 molecule entry- molecule name, smiles, properties, etc. The second is to create a boundary between distinct db2 entries<br />
"A" and "B" lines define atoms and bonds for a molecule. These are practically identical to atom/bond lines in .mol2 files- not much else to say here<br />
"X" lines define the set of all coordinates for the entry- these will be referenced later on. coordinates are guaranteed to be "distinct", in the sense that no coordinate is within a threshold distance (typically 0.001A) of any other coordinate in the set<br />
"R" lines define the coordinates of the rigid component. this is the basis component of the molecule that does not move between rotomers, usually a benzene ring or the like. the choice of which structure should be the rigid component is arbitrary, and in fact usually a separate db2 entry is created for each possible rigid component (usually each ring system) in a molecule<br />
"C" lines are "confs", not to be confused with conformations. We will get back to these.<br />
"S" lines or "set" lines define the rotomers/conformations. Each set entry may span multiple "S" lines to fit line-width requirements, and each set entry refers to a single rotomer/conformation. We will get back to these as well.<br />
"D" lines are "clusters", and are a feature of older db2 files. The cluster lines were meant to group together similar rotomers, though the idea was scrapped and newer db2 files no longer have them, thus "D" may now stand for "deprecated"<br />
<br />
== What is the deal with sets and confs? ==<br />
<br />
First off- the naming is super confusing. Confs don't describe conformations(rotomers), rather they describe a subset of a conformation.<br />
Sets are collections of confs, and confs are collections of atom xyz positions. Each set describes a conformation(rotomer) of the molecule.<br />
A single conf may be a part of one or more sets, thus confs can be used to define exactly where sets overlap<br />
It is common to see confs comprised of a single atom- this may describe a situation where two rotomers happen to overlap at a particular atom.<br />
A conf may also be a part of just one set- this describes a situation where none of the atoms in the conf overlap with other sets, and is quite common<br />
<br />
== The nittier grittier ==<br />
<br />
Here is the specification of each type of line described before. Parts of the specification I don't understand the importance of are followed with a ?<br />
<nowiki><br />
M [name] [protname?] [# atoms] [# bonds] [# xyz] [# confs] [# sets] [# rigid] 5(?) [# clusters]<br />
M [total charge] [total polar solv] [total apolar solv] [total solv] [total surface?]<br />
M [smiles]<br />
M [long name]<br />
<br />
A [atom index] [atom name] [atom type] [dock type #] [dock color #] [solv charge] [polar solv] [apolar solv] [solv] [surface?]<br />
<br />
B [bond index] [atom start] [atom end] [bond type]<br />
<br />
X [xyz index] [atom index] [conf index] [x] [y] [z]<br />
<br />
R [rigid index] [dock color #] [x] [y] [z]<br />
<br />
C [conf index] [xyz index start] [xyz index end]<br />
<br />
S [set index] [number of S lines to follow] [number of confs] [brokenSet?] [outHydro?] [energy1] [energy2]<br />
S [set index] [set line index] [number of confs to follow] [conf 1] ... [conf N] </nowiki><br />
<br />
== Example ==<br />
<br />
Learning by example is best, so let's imagine the following (fake) rotomer set for a very simple molecule<br />
<nowiki><br />
+-------+<br />
| H |<br />
| | |<br />
|O==C |<br />
+-------+<br />
| |<br />
| |<br />
|O==C--H|<br />
+-------+</nowiki><br />
<br />
Let us say the C atom lies at (0, 0). A somewhat abbreviated db2 for the set would look like this:<br />
<nowiki><br />
M fake<br />
A 1 O<br />
A 2 C<br />
A 3 H<br />
B 1 1 2 2<br />
B 2 2 3 1<br />
X 1 1 1 -1.000 +0.000<br />
X 2 2 1 +0.000 +0.000<br />
X 3 3 2 +0.000 +1.000<br />
X 4 3 3 +1.000 +0.000<br />
R 1 7 -1.000 +0.000<br />
R 2 7 +0.000 +0.000<br />
C 1 1 2<br />
C 2 3 3<br />
C 3 4 4<br />
S 1 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 2<br />
S 2 1 2 0 0 +0.000 +0.000<br />
S 1 2 1 3</nowiki></div>Btinglehttp://wiki.docking.org/index.php?title=Smallworld_and_Arthor_Databases&diff=15434Smallworld and Arthor Databases2023-06-13T19:27:40Z<p>Btingle: /* Smallworld slides/presentations */</p>
<hr />
<div>== Introduction ==<br />
This page details the databases on Smallworld and Arthor.<br />
<br />
{{TOCright}}<br />
<br />
== Smallworld slides/presentations ==<br />
<br />
https://www.slideshare.net/NextMoveSoftware/smallworld-efficient-maximum-common-substructure-searching-of-large-databases<br />
<br />
https://www.nextmovesoftware.com/products/SmallWorld.pdf <--- very good, more detailed overview of how the technology works<br />
<br />
https://www.nextmovesoftware.com/talks/Sayle_InterestingApplicationsOfChemicalGraphEditDistance_ACS_202303.pdf <--- random slides i found not that interesting<br />
<br />
== Smallworld Databases ==<br />
Tables of the five smallworld instances running on '''abacus'''.<br />
<br />
=== sw.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (Always latest version available, shows number when available)<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || All building blocks in ZINC.<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || All in-stock building blocks.<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks sold by ChemSpace from stock.<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022 || Screening compounds sold by ChemSpace.<br />
|-<br />
| In-Stock-2020Q2-13.8M.anon || All ZINC in stock compounds.<br />
|-<br />
| Informer-Set-22Q3-4M || The ZINC "informer set" (compounds that can be purchased quickly, within a week or so). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Building blocks sold by Mcule<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule screening compounds.<br />
|-<br />
| Mcule-Full-22Q1-56M || Mcule, all compounds<br />
|-<br />
| Mcule-V-22Q1-47M || Mcule, make-on-demand.<br />
|-<br />
| MculeUltimate_20Q2_126M || Mcule "Ultimate" library.<br />
|-<br />
| REAL-Database-22Q1-4.5B || Enamine REAL database. (publicly available)<br />
|-<br />
| Wait-OK-2020Q2-1.2B.anon || ZINC compounds, in stock combined with make-on-demand.<br />
|-<br />
| WuXi-20Q4-2.2B || WuXi compounds, almost all make-on-demand.<br />
|-<br />
| ZINC-All-2020Q2-1.46B.anon || All compounds in ZINC20, including annotated only (not purchasable) compounds.<br />
|-<br />
| ZINC-Interesting-2020Q2-300K.anon || Drugs, bioactive, natural products, biogenic, or otherwise annotated for activity. <br />
|-<br />
|}<br />
<br />
=== swp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (version, number of molecules)<br />
|-<br />
| el2_22Q1_290K || Ellman library 2 - isoquinuclidines<br />
|-<br />
| REAL-Space-22Q1-21B || Enamine private library - password protected.<br />
|-<br />
| y1-22Q3-57M || Damien Young library #1, substituted piperidines.<br />
|-<br />
| zinc22-All || ZINC-22<br />
|-<br />
|}<br />
<br />
=== swbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks for sale from ChemSpace<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || ZINC20 Building blocks for sale<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || ZINC20 Building blocks for rapid delivery (in-stock). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Mcule building blocks<br />
|-<br />
|}<br />
<br />
=== swcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1_290K || Ellman library 2. isoquinuclidines<br />
|-<br />
| Piperazine-22Q3-57M || Damien Young library 1.<br />
|-<br />
|}<br />
<br />
== Arthor Databases ==<br />
Tables of the arthor instances.<br />
=== arthor.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-50-22Q1 || ZINC20 Building blocks, best price/delivery-speed combination<br />
|-<br />
| BB-40-22Q1 || ZINC20 building blocks, second tier in stock. <br />
|-<br />
| BB-30-22Q1 || ZINC20 building blocks, third tier in stock<br />
|-<br />
| BB-20-22Q1 || ZINC20 make-on-demand building blocks. Likely >> $500 . Likely 6 weeks or more. <br />
|-<br />
| BB-10-22Q1 || ZINC20 make-on-demand or expensive building blocks. Like >> $1000. Likely 6 weeks or more.<br />
|-<br />
| BB-ForSale-22Q1 || ZINC20 building blocks (50+40+30+20+10)<br />
|-<br />
| BB-InStock-22Q1 || ZINC20 building blocks (50+40+30)<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || ChemSpace building blocks<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022-346K || ChemSpace screening compounds<br />
|-<br />
| HMDBMetabolites-20Q1-585 || HMDB Metabolites<br />
|-<br />
| In-Stock-19Q4-13.8M || ZINC20 in stock <br />
|-<br />
| Informer-Set-22Q3-4M || ZINC20 informer set (fast delivery, modest prices)<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule Screening compounds<br />
|-<br />
| Mcule-V-22Q1-51M || Mcule make on demand<br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
| Mcule-Full-22Q1-60M || Mcule SC+BB<br />
|-<br />
| Mcule_ultimate_20Q2-126M || Mcule Ultimate<br />
|-<br />
| REAL-Database-22Q1-00 || Enamine REAL one part<br />
|-<br />
| REAL-Database-22Q1-01 || Enamine REAL another part<br />
|-<br />
| TCNMP-20Q1-37K || Traditional Chinese Medicine database<br />
|-<br />
| Wait-OK-19Q4-1.1B || ZINC20 in stock and make on demand<br />
|-<br />
| World-Drugs-20Q1-3K || ZINC20 drugs<br />
|-<br />
| WuXi-19Q4-339M || WuXi make-on-demand<br />
|-<br />
| ZINC-All-19Q4-1.4B || All ZINC20<br />
|-<br />
| ZINC-Interesting-2019Q4-307K || ZINC20 annotated, bioactive <br />
|-<br />
| ZINC-On-Demand-19Q4-311M || ZINC20 make on demand<br />
|-<br />
| ZINC20-ForSale-22Q1 || ZINC20 for sale <br />
|-<br />
|}<br />
<br />
=== arthorp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
| REAL-Space-22Q1-00 || <br />
|-<br />
| REAL-Space-22Q1-01 || some description<br />
|-<br />
| REAL-Space-22Q1-02 || <br />
|-<br />
| REAL-Space-22Q1-03 || some description<br />
|-<br />
| REAL-Space-22Q1-04 || <br />
|-<br />
| REAL-Space-22Q1-05 || some description <br />
|-<br />
| REAL-Space-22Q1-06 || <br />
|-<br />
| REAL-Space-22Q1-07 || some description<br />
|-<br />
| zinc22-22Q1(H01~H25) || <br />
|-<br />
|}<br />
=== arthorbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-10-22Q1 || some description<br />
|-<br />
| BB-20-22Q1 || <br />
|-<br />
| BB-30-22Q1 || some description<br />
|-<br />
| BB-40-22Q1 || <br />
|-<br />
| BB-50-22Q1 || some description<br />
|-<br />
| BB-ForSale-22Q1 || <br />
|-<br />
| BB-InStock-22Q1 || some description <br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || <br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
|}<br />
=== arthorcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
|}</div>Btinglehttp://wiki.docking.org/index.php?title=Smallworld_and_Arthor_Databases&diff=15433Smallworld and Arthor Databases2023-06-13T19:24:22Z<p>Btingle: /* Smallworld Databases */</p>
<hr />
<div>== Introduction ==<br />
This page details the databases on Smallworld and Arthor.<br />
<br />
{{TOCright}}<br />
<br />
== Smallworld slides/presentations ==<br />
<br />
https://www.slideshare.net/NextMoveSoftware/smallworld-efficient-maximum-common-substructure-searching-of-large-databases<br />
<br />
https://www.nextmovesoftware.com/products/SmallWorld.pdf<br />
<br />
https://www.nextmovesoftware.com/talks/Sayle_InterestingApplicationsOfChemicalGraphEditDistance_ACS_202303.pdf<br />
<br />
<br />
== Smallworld Databases ==<br />
Tables of the five smallworld instances running on '''abacus'''.<br />
<br />
=== sw.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (Always latest version available, shows number when available)<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || All building blocks in ZINC.<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || All in-stock building blocks.<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks sold by ChemSpace from stock.<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022 || Screening compounds sold by ChemSpace.<br />
|-<br />
| In-Stock-2020Q2-13.8M.anon || All ZINC in stock compounds.<br />
|-<br />
| Informer-Set-22Q3-4M || The ZINC "informer set" (compounds that can be purchased quickly, within a week or so). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Building blocks sold by Mcule<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule screening compounds.<br />
|-<br />
| Mcule-Full-22Q1-56M || Mcule, all compounds<br />
|-<br />
| Mcule-V-22Q1-47M || Mcule, make-on-demand.<br />
|-<br />
| MculeUltimate_20Q2_126M || Mcule "Ultimate" library.<br />
|-<br />
| REAL-Database-22Q1-4.5B || Enamine REAL database. (publicly available)<br />
|-<br />
| Wait-OK-2020Q2-1.2B.anon || ZINC compounds, in stock combined with make-on-demand.<br />
|-<br />
| WuXi-20Q4-2.2B || WuXi compounds, almost all make-on-demand.<br />
|-<br />
| ZINC-All-2020Q2-1.46B.anon || All compounds in ZINC20, including annotated only (not purchasable) compounds.<br />
|-<br />
| ZINC-Interesting-2020Q2-300K.anon || Drugs, bioactive, natural products, biogenic, or otherwise annotated for activity. <br />
|-<br />
|}<br />
<br />
=== swp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (version, number of molecules)<br />
|-<br />
| el2_22Q1_290K || Ellman library 2 - isoquinuclidines<br />
|-<br />
| REAL-Space-22Q1-21B || Enamine private library - password protected.<br />
|-<br />
| y1-22Q3-57M || Damien Young library #1, substituted piperidines.<br />
|-<br />
| zinc22-All || ZINC-22<br />
|-<br />
|}<br />
<br />
=== swbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks for sale from ChemSpace<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || ZINC20 Building blocks for sale<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || ZINC20 Building blocks for rapid delivery (in-stock). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Mcule building blocks<br />
|-<br />
|}<br />
<br />
=== swcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1_290K || Ellman library 2. isoquinuclidines<br />
|-<br />
| Piperazine-22Q3-57M || Damien Young library 1.<br />
|-<br />
|}<br />
<br />
== Arthor Databases ==<br />
Tables of the arthor instances.<br />
=== arthor.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-50-22Q1 || ZINC20 Building blocks, best price/delivery-speed combination<br />
|-<br />
| BB-40-22Q1 || ZINC20 building blocks, second tier in stock. <br />
|-<br />
| BB-30-22Q1 || ZINC20 building blocks, third tier in stock<br />
|-<br />
| BB-20-22Q1 || ZINC20 make-on-demand building blocks. Likely >> $500 . Likely 6 weeks or more. <br />
|-<br />
| BB-10-22Q1 || ZINC20 make-on-demand or expensive building blocks. Like >> $1000. Likely 6 weeks or more.<br />
|-<br />
| BB-ForSale-22Q1 || ZINC20 building blocks (50+40+30+20+10)<br />
|-<br />
| BB-InStock-22Q1 || ZINC20 building blocks (50+40+30)<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || ChemSpace building blocks<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022-346K || ChemSpace screening compounds<br />
|-<br />
| HMDBMetabolites-20Q1-585 || HMDB Metabolites<br />
|-<br />
| In-Stock-19Q4-13.8M || ZINC20 in stock <br />
|-<br />
| Informer-Set-22Q3-4M || ZINC20 informer set (fast delivery, modest prices)<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule Screening compounds<br />
|-<br />
| Mcule-V-22Q1-51M || Mcule make on demand<br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
| Mcule-Full-22Q1-60M || Mcule SC+BB<br />
|-<br />
| Mcule_ultimate_20Q2-126M || Mcule Ultimate<br />
|-<br />
| REAL-Database-22Q1-00 || Enamine REAL one part<br />
|-<br />
| REAL-Database-22Q1-01 || Enamine REAL another part<br />
|-<br />
| TCNMP-20Q1-37K || Traditional Chinese Medicine database<br />
|-<br />
| Wait-OK-19Q4-1.1B || ZINC20 in stock and make on demand<br />
|-<br />
| World-Drugs-20Q1-3K || ZINC20 drugs<br />
|-<br />
| WuXi-19Q4-339M || WuXi make-on-demand<br />
|-<br />
| ZINC-All-19Q4-1.4B || All ZINC20<br />
|-<br />
| ZINC-Interesting-2019Q4-307K || ZINC20 annotated, bioactive <br />
|-<br />
| ZINC-On-Demand-19Q4-311M || ZINC20 make on demand<br />
|-<br />
| ZINC20-ForSale-22Q1 || ZINC20 for sale <br />
|-<br />
|}<br />
<br />
=== arthorp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
| REAL-Space-22Q1-00 || <br />
|-<br />
| REAL-Space-22Q1-01 || some description<br />
|-<br />
| REAL-Space-22Q1-02 || <br />
|-<br />
| REAL-Space-22Q1-03 || some description<br />
|-<br />
| REAL-Space-22Q1-04 || <br />
|-<br />
| REAL-Space-22Q1-05 || some description <br />
|-<br />
| REAL-Space-22Q1-06 || <br />
|-<br />
| REAL-Space-22Q1-07 || some description<br />
|-<br />
| zinc22-22Q1(H01~H25) || <br />
|-<br />
|}<br />
=== arthorbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-10-22Q1 || some description<br />
|-<br />
| BB-20-22Q1 || <br />
|-<br />
| BB-30-22Q1 || some description<br />
|-<br />
| BB-40-22Q1 || <br />
|-<br />
| BB-50-22Q1 || some description<br />
|-<br />
| BB-ForSale-22Q1 || <br />
|-<br />
| BB-InStock-22Q1 || some description <br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || <br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
|}<br />
=== arthorcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
|}</div>Btinglehttp://wiki.docking.org/index.php?title=Smallworld_and_Arthor_Databases&diff=15432Smallworld and Arthor Databases2023-06-13T19:23:11Z<p>Btingle: </p>
<hr />
<div>== Introduction ==<br />
This page details the databases on Smallworld and Arthor.<br />
<br />
{{TOCright}}<br />
<br />
== Smallworld Databases ==<br />
Tables of the five smallworld instances running on '''abacus'''.<br />
<br />
Smallworld slides/presentations:<br />
<br />
https://www.slideshare.net/NextMoveSoftware/smallworld-efficient-maximum-common-substructure-searching-of-large-databases<br />
<br />
https://www.nextmovesoftware.com/products/SmallWorld.pdf<br />
<br />
https://www.nextmovesoftware.com/talks/Sayle_InterestingApplicationsOfChemicalGraphEditDistance_ACS_202303.pdf<br />
<br />
=== sw.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (Always latest version available, shows number when available)<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || All building blocks in ZINC.<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || All in-stock building blocks.<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks sold by ChemSpace from stock.<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022 || Screening compounds sold by ChemSpace.<br />
|-<br />
| In-Stock-2020Q2-13.8M.anon || All ZINC in stock compounds.<br />
|-<br />
| Informer-Set-22Q3-4M || The ZINC "informer set" (compounds that can be purchased quickly, within a week or so). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Building blocks sold by Mcule<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule screening compounds.<br />
|-<br />
| Mcule-Full-22Q1-56M || Mcule, all compounds<br />
|-<br />
| Mcule-V-22Q1-47M || Mcule, make-on-demand.<br />
|-<br />
| MculeUltimate_20Q2_126M || Mcule "Ultimate" library.<br />
|-<br />
| REAL-Database-22Q1-4.5B || Enamine REAL database. (publicly available)<br />
|-<br />
| Wait-OK-2020Q2-1.2B.anon || ZINC compounds, in stock combined with make-on-demand.<br />
|-<br />
| WuXi-20Q4-2.2B || WuXi compounds, almost all make-on-demand.<br />
|-<br />
| ZINC-All-2020Q2-1.46B.anon || All compounds in ZINC20, including annotated only (not purchasable) compounds.<br />
|-<br />
| ZINC-Interesting-2020Q2-300K.anon || Drugs, bioactive, natural products, biogenic, or otherwise annotated for activity. <br />
|-<br />
|}<br />
<br />
=== swp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (version, number of molecules)<br />
|-<br />
| el2_22Q1_290K || Ellman library 2 - isoquinuclidines<br />
|-<br />
| REAL-Space-22Q1-21B || Enamine private library - password protected.<br />
|-<br />
| y1-22Q3-57M || Damien Young library #1, substituted piperidines.<br />
|-<br />
| zinc22-All || ZINC-22<br />
|-<br />
|}<br />
<br />
=== swbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks for sale from ChemSpace<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || ZINC20 Building blocks for sale<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || ZINC20 Building blocks for rapid delivery (in-stock). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Mcule building blocks<br />
|-<br />
|}<br />
<br />
=== swcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1_290K || Ellman library 2. isoquinuclidines<br />
|-<br />
| Piperazine-22Q3-57M || Damien Young library 1.<br />
|-<br />
|}<br />
<br />
== Arthor Databases ==<br />
Tables of the arthor instances.<br />
=== arthor.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-50-22Q1 || ZINC20 Building blocks, best price/delivery-speed combination<br />
|-<br />
| BB-40-22Q1 || ZINC20 building blocks, second tier in stock. <br />
|-<br />
| BB-30-22Q1 || ZINC20 building blocks, third tier in stock<br />
|-<br />
| BB-20-22Q1 || ZINC20 make-on-demand building blocks. Likely >> $500 . Likely 6 weeks or more. <br />
|-<br />
| BB-10-22Q1 || ZINC20 make-on-demand or expensive building blocks. Like >> $1000. Likely 6 weeks or more.<br />
|-<br />
| BB-ForSale-22Q1 || ZINC20 building blocks (50+40+30+20+10)<br />
|-<br />
| BB-InStock-22Q1 || ZINC20 building blocks (50+40+30)<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || ChemSpace building blocks<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022-346K || ChemSpace screening compounds<br />
|-<br />
| HMDBMetabolites-20Q1-585 || HMDB Metabolites<br />
|-<br />
| In-Stock-19Q4-13.8M || ZINC20 in stock <br />
|-<br />
| Informer-Set-22Q3-4M || ZINC20 informer set (fast delivery, modest prices)<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule Screening compounds<br />
|-<br />
| Mcule-V-22Q1-51M || Mcule make on demand<br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
| Mcule-Full-22Q1-60M || Mcule SC+BB<br />
|-<br />
| Mcule_ultimate_20Q2-126M || Mcule Ultimate<br />
|-<br />
| REAL-Database-22Q1-00 || Enamine REAL one part<br />
|-<br />
| REAL-Database-22Q1-01 || Enamine REAL another part<br />
|-<br />
| TCNMP-20Q1-37K || Traditional Chinese Medicine database<br />
|-<br />
| Wait-OK-19Q4-1.1B || ZINC20 in stock and make on demand<br />
|-<br />
| World-Drugs-20Q1-3K || ZINC20 drugs<br />
|-<br />
| WuXi-19Q4-339M || WuXi make-on-demand<br />
|-<br />
| ZINC-All-19Q4-1.4B || All ZINC20<br />
|-<br />
| ZINC-Interesting-2019Q4-307K || ZINC20 annotated, bioactive <br />
|-<br />
| ZINC-On-Demand-19Q4-311M || ZINC20 make on demand<br />
|-<br />
| ZINC20-ForSale-22Q1 || ZINC20 for sale <br />
|-<br />
|}<br />
<br />
=== arthorp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
| REAL-Space-22Q1-00 || <br />
|-<br />
| REAL-Space-22Q1-01 || some description<br />
|-<br />
| REAL-Space-22Q1-02 || <br />
|-<br />
| REAL-Space-22Q1-03 || some description<br />
|-<br />
| REAL-Space-22Q1-04 || <br />
|-<br />
| REAL-Space-22Q1-05 || some description <br />
|-<br />
| REAL-Space-22Q1-06 || <br />
|-<br />
| REAL-Space-22Q1-07 || some description<br />
|-<br />
| zinc22-22Q1(H01~H25) || <br />
|-<br />
|}<br />
=== arthorbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-10-22Q1 || some description<br />
|-<br />
| BB-20-22Q1 || <br />
|-<br />
| BB-30-22Q1 || some description<br />
|-<br />
| BB-40-22Q1 || <br />
|-<br />
| BB-50-22Q1 || some description<br />
|-<br />
| BB-ForSale-22Q1 || <br />
|-<br />
| BB-InStock-22Q1 || some description <br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || <br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
|}<br />
=== arthorcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
|}</div>Btinglehttp://wiki.docking.org/index.php?title=Smallworld_and_Arthor_Databases&diff=15431Smallworld and Arthor Databases2023-06-13T19:20:43Z<p>Btingle: </p>
<hr />
<div>== Introduction ==<br />
This page details the databases on Smallworld and Arthor.<br />
<br />
{{TOCright}}<br />
<br />
== Smallworld Databases ==<br />
Tables of the five smallworld instances running on '''abacus'''.<br />
<br />
https://www.nextmovesoftware.com/products/SmallWorld.pdf<br />
<br />
https://www.nextmovesoftware.com/talks/Sayle_InterestingApplicationsOfChemicalGraphEditDistance_ACS_202303.pdf<br />
<br />
=== sw.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (Always latest version available, shows number when available)<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || All building blocks in ZINC.<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || All in-stock building blocks.<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks sold by ChemSpace from stock.<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022 || Screening compounds sold by ChemSpace.<br />
|-<br />
| In-Stock-2020Q2-13.8M.anon || All ZINC in stock compounds.<br />
|-<br />
| Informer-Set-22Q3-4M || The ZINC "informer set" (compounds that can be purchased quickly, within a week or so). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Building blocks sold by Mcule<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule screening compounds.<br />
|-<br />
| Mcule-Full-22Q1-56M || Mcule, all compounds<br />
|-<br />
| Mcule-V-22Q1-47M || Mcule, make-on-demand.<br />
|-<br />
| MculeUltimate_20Q2_126M || Mcule "Ultimate" library.<br />
|-<br />
| REAL-Database-22Q1-4.5B || Enamine REAL database. (publicly available)<br />
|-<br />
| Wait-OK-2020Q2-1.2B.anon || ZINC compounds, in stock combined with make-on-demand.<br />
|-<br />
| WuXi-20Q4-2.2B || WuXi compounds, almost all make-on-demand.<br />
|-<br />
| ZINC-All-2020Q2-1.46B.anon || All compounds in ZINC20, including annotated only (not purchasable) compounds.<br />
|-<br />
| ZINC-Interesting-2020Q2-300K.anon || Drugs, bioactive, natural products, biogenic, or otherwise annotated for activity. <br />
|-<br />
|}<br />
<br />
=== swp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description (version, number of molecules)<br />
|-<br />
| el2_22Q1_290K || Ellman library 2 - isoquinuclidines<br />
|-<br />
| REAL-Space-22Q1-21B || Enamine private library - password protected.<br />
|-<br />
| y1-22Q3-57M || Damien Young library #1, substituted piperidines.<br />
|-<br />
| zinc22-All || ZINC-22<br />
|-<br />
|}<br />
<br />
=== swbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022 || Building blocks for sale from ChemSpace<br />
|-<br />
| BB-All-2020Q2-26.7M.anon || ZINC20 Building blocks for sale<br />
|-<br />
| BB-Now-2020Q2-1.6M.anon || ZINC20 Building blocks for rapid delivery (in-stock). <br />
|-<br />
| Mcule-BB-22Q1-2.1M || Mcule building blocks<br />
|-<br />
|}<br />
<br />
=== swcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1_290K || Ellman library 2. isoquinuclidines<br />
|-<br />
| Piperazine-22Q3-57M || Damien Young library 1.<br />
|-<br />
|}<br />
<br />
== Arthor Databases ==<br />
Tables of the arthor instances.<br />
=== arthor.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-50-22Q1 || ZINC20 Building blocks, best price/delivery-speed combination<br />
|-<br />
| BB-40-22Q1 || ZINC20 building blocks, second tier in stock. <br />
|-<br />
| BB-30-22Q1 || ZINC20 building blocks, third tier in stock<br />
|-<br />
| BB-20-22Q1 || ZINC20 make-on-demand building blocks. Likely >> $500 . Likely 6 weeks or more. <br />
|-<br />
| BB-10-22Q1 || ZINC20 make-on-demand or expensive building blocks. Like >> $1000. Likely 6 weeks or more.<br />
|-<br />
| BB-ForSale-22Q1 || ZINC20 building blocks (50+40+30+20+10)<br />
|-<br />
| BB-InStock-22Q1 || ZINC20 building blocks (50+40+30)<br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || ChemSpace building blocks<br />
|-<br />
| ChemSpace-SC-Stock-Mar2022-346K || ChemSpace screening compounds<br />
|-<br />
| HMDBMetabolites-20Q1-585 || HMDB Metabolites<br />
|-<br />
| In-Stock-19Q4-13.8M || ZINC20 in stock <br />
|-<br />
| Informer-Set-22Q3-4M || ZINC20 informer set (fast delivery, modest prices)<br />
|-<br />
| Mcule-22Q1-8.7M || Mcule Screening compounds<br />
|-<br />
| Mcule-V-22Q1-51M || Mcule make on demand<br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
| Mcule-Full-22Q1-60M || Mcule SC+BB<br />
|-<br />
| Mcule_ultimate_20Q2-126M || Mcule Ultimate<br />
|-<br />
| REAL-Database-22Q1-00 || Enamine REAL one part<br />
|-<br />
| REAL-Database-22Q1-01 || Enamine REAL another part<br />
|-<br />
| TCNMP-20Q1-37K || Traditional Chinese Medicine database<br />
|-<br />
| Wait-OK-19Q4-1.1B || ZINC20 in stock and make on demand<br />
|-<br />
| World-Drugs-20Q1-3K || ZINC20 drugs<br />
|-<br />
| WuXi-19Q4-339M || WuXi make-on-demand<br />
|-<br />
| ZINC-All-19Q4-1.4B || All ZINC20<br />
|-<br />
| ZINC-Interesting-2019Q4-307K || ZINC20 annotated, bioactive <br />
|-<br />
| ZINC-On-Demand-19Q4-311M || ZINC20 make on demand<br />
|-<br />
| ZINC20-ForSale-22Q1 || ZINC20 for sale <br />
|-<br />
|}<br />
<br />
=== arthorp.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
| REAL-Space-22Q1-00 || <br />
|-<br />
| REAL-Space-22Q1-01 || some description<br />
|-<br />
| REAL-Space-22Q1-02 || <br />
|-<br />
| REAL-Space-22Q1-03 || some description<br />
|-<br />
| REAL-Space-22Q1-04 || <br />
|-<br />
| REAL-Space-22Q1-05 || some description <br />
|-<br />
| REAL-Space-22Q1-06 || <br />
|-<br />
| REAL-Space-22Q1-07 || some description<br />
|-<br />
| zinc22-22Q1(H01~H25) || <br />
|-<br />
|}<br />
=== arthorbb.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| BB-10-22Q1 || some description<br />
|-<br />
| BB-20-22Q1 || <br />
|-<br />
| BB-30-22Q1 || some description<br />
|-<br />
| BB-40-22Q1 || <br />
|-<br />
| BB-50-22Q1 || some description<br />
|-<br />
| BB-ForSale-22Q1 || <br />
|-<br />
| BB-InStock-22Q1 || some description <br />
|-<br />
| ChemSpace-BB-Stock-Mar2022-712K || <br />
|-<br />
| Mcule-BB-22Q1-2.1M || some description <br />
|-<br />
|}<br />
=== arthorcc.docking.org ===<br />
{| class="wikitable"<br />
|-<br />
! Database !! Description<br />
|-<br />
| el2_22Q1 || some description<br />
|-<br />
|}</div>Btinglehttp://wiki.docking.org/index.php?title=Installing_The_3D_Pipeline_ZINC22&diff=15415Installing The 3D Pipeline ZINC222023-05-31T22:21:41Z<p>Btingle: </p>
<hr />
<div>Installation of the 3D pipeline is somewhat tricky- the environment is very particular about which versions of which software should be used. It is thus easiest to copy the exact software packages we use from our servers- this should work provided they are installed in a 64-bit linux architecture, though depending on the distribution certain shared libraries may be missing.<br />
<br />
== Setting up the installation root ==<br />
<br />
First find a suitable directory which will serve as the root of the installation- this should be a directory that is visible from all nodes in the cluster.<br />
<br />
Within this directory, create two sub-directories:<br />
<nowiki><br />
$ROOT_DIR/<br />
soft<br />
licenses</nowiki><br />
<br />
Copy or link your openeye and chemaxon licenses to the licenses directory- name them ".oe-license.txt" and ".jchem-license.cxl", respectively.<br />
<br />
Next, clone the submission scripts from github to this directory.<br />
<nowiki>git clone https://github.com/docking-org/zinc22-3d-submit</nowiki><br />
<br />
I like to rename this repository directory to just "submit", leaving the installation looking like this:<br />
<nowiki><br />
$ROOT_DIR/<br />
soft<br />
licenses/<br />
.oe-license.txt<br />
.jchem-license.cxl<br />
submit</nowiki><br />
<br />
Finally, create the "env.sh" and "env.csh" scripts in the ROOT_DIR as follows:<br />
<br />
<nowiki><br />
#!/bin/bash<br />
# env.sh<br />
base=<<ROOT_DIR>><br />
export BINDIR=$base/submit<br />
export SHRTCACHE=<<TEMPDIR 1>><br />
export LONGCACHE=<<TEMPDIR 2>><br />
export SOFT_HOME=$base/soft<br />
export LICENSE_HOME=$base/licenses<br />
export PATH=$PATH:$base/submit</nowiki><br />
<br />
<nowiki><br />
#!/usr/bin/csh<br />
# env.csh<br />
set base=<<ROOT_DIR>><br />
setenv SHRTCACHE <<TEMPDIR 1>><br />
setenv LONGCACHE <<TEMPDIR 2>><br />
setenv BINDIR $base/submit<br />
setenv SOFT_HOME $base/soft<br />
setenv LICENSE_HOME $base/licenses<br />
setenv PATH $PATH\:$base/submit</nowiki><br />
<br />
<<ROOT_DIR>> should be the installation directory you chose.<br />
<br />
<<TEMPDIR 1>> and <<TEMPDIR 2>> should be temporary directories available to all nodes on the cluster- often in distributed computing environments there are special directories set aside for this purpose, e.g /scratch<br />
<br />
<<TEMPDIR 1>> will be used for short-term storage of small job files- it is thus appropriate to set this to a faster-access lower-capacity location, like /dev/shm. using /dev/shm can introduce problems, so it is safest to use the same value as LONGCACHE<br />
<br />
<<TEMPDIR 2>> will be used for long-term storage of software files- thus it is more appropriate to set this to a slower-access higher-capacity location, like /tmp or /scratch.</div>Btinglehttp://wiki.docking.org/index.php?title=Installing_The_3D_Pipeline_ZINC22&diff=15414Installing The 3D Pipeline ZINC222023-05-31T22:21:01Z<p>Btingle: Created page with "Installation of the 3D pipeline is somewhat tricky- the environment is very particular about which versions of which software should be used. It is thus easiest to copy the exact software packages we use from our servers- this should work provided they are installed in a 64-bit linux architecture, though depending on the distribution certain shared libraries may be missing. == Setting up the installation root == First find a suitable directory which will serve as the r..."</p>
<hr />
<div>Installation of the 3D pipeline is somewhat tricky- the environment is very particular about which versions of which software should be used. It is thus easiest to copy the exact software packages we use from our servers- this should work provided they are installed in a 64-bit linux architecture, though depending on the distribution certain shared libraries may be missing.<br />
<br />
== Setting up the installation root ==<br />
<br />
First find a suitable directory which will serve as the root of the installation- this should be a directory that is visible from all nodes in the cluster.<br />
<br />
Within this directory, create two sub-directories:<br />
<nowiki><br />
$ROOT_DIR/<br />
soft<br />
licenses</nowiki><br />
<br />
Copy or link your openeye and chemaxon licenses to the licenses directory- name them ".oe-license.txt" and ".jchem-license.cxl", respectively.<br />
<br />
Next, clone the submission scripts from github to this directory.<br />
<nowiki>git clone https://github.com/docking-org/zinc22-3d-submit</nowiki><br />
<br />
I like to rename this repository directory to just "submit", leaving the installation looking like this:<br />
<nowiki><br />
$ROOT_DIR/<br />
soft<br />
licenses/<br />
.oe-license.txt<br />
.jchem-license.cxl<br />
submit</nowiki><br />
<br />
Finally, create the "env.sh" and "env.csh" scripts in the ROOT_DIR as follows:<br />
<br />
<nowiki><br />
#!/bin/bash<br />
# env.sh<br />
base=<<ROOT_DIR>><br />
export BINDIR=$base/submit<br />
export SHRTCACHE=<<TEMPDIR 1>><br />
export LONGCACHE=<<TEMPDIR 2>><br />
export SOFT_HOME=$base/soft<br />
export LICENSE_HOME=$base/licenses<br />
export PATH=$PATH:$base/submit</nowiki><br />
<br />
<nowiki><br />
#!/usr/bin/csh<br />
# env.csh<br />
set base=<<ROOT_DIR>><br />
setenv SHRTCACHE <<TEMPDIR 1>><br />
setenv LONGCACHE <<TEMPDIR 2>><br />
setenv BINDIR $base/submit<br />
setenv SOFT_HOME $base/soft<br />
setenv LICENSE_HOME $base/licenses<br />
setenv PATH $PATH\:$base/submit</nowiki><br />
<br />
<<ROOT_DIR>> should be the installation directory you chose.<br />
<br />
<<TEMPDIR 1>> and <<TEMPDIR 2>> should be temporary directories available to all nodes on the cluster- often in distributed computing environments there are special directories set aside for this purpose, e.g /scratch<br />
<br />
<<TEMPDIR 1>> will be used for short-term storage of small job files- it is thus appropriate to set this to a faster-access lower-capacity location, like /dev/shm<br />
<br />
<<TEMPDIR 2>> will be used for long-term storage of software files- thus it is more appropriate to set this to a slower-access higher-capacity location, like /tmp or /scratch.</div>Btinglehttp://wiki.docking.org/index.php?title=TLDR&diff=15406TLDR2023-05-18T18:57:10Z<p>Btingle: /* Making changes to modules & scripts */</p>
<hr />
<div>TLDR (tldr.docking.org) is a web-based interface to molecular docking and related tools.<br />
The system currently consists of a dozen apps. We plan to support dozens, perhaps a hundred all told.<br />
Here we list the apps you can currently use, with usage notes. <br />
<br />
This is also called "Add TLDR Module"<br />
<br />
{{TOCright}}<br />
<br />
== Current available modules ==<br />
=== [[TLDR:arthorbatch|Arthorbatch]] ===<br />
<br />
=== [[TLDR:bioisostere|Bioisostere]] ===<br />
<br />
=== [[TLDR:bootstrap1|Bootstrap1]] ===<br />
<br />
=== [[TLDR:bootstrap2|Bootstrap2]] ===<br />
<br />
=== [[TLDR:dude-z|DUDE-Z]] ===<br />
<br />
=== [[TLDR:extrema|Extrema]] ===<br />
<br />
=== [[TLDR:newbuild3d|Newbuild3d]] ===<br />
<br />
=== [[TLDR:strain|Strain]] ===<br />
<br />
=== [[TLDR:swbatch|Swbatch]] ===<br />
<br />
== In Development ==<br />
=== Blaster === <br />
The purpose of the blaster app is to prepare a receptor for docking, including some basic analysis.<br />
<br />
This app requires:<br />
* A structure for your receptor protein provided as a pdb file<br />
* A binding site provided in one of the following ways: <br />
1) Supply ligand in binding site<br />
2) Provide binding site residues<br />
3) Use a program to identify all potential binding sites. Choose which of the binding sites to test or test them all.<br />
<br />
The app returns:<br />
* dockfiles, which may be used for large library docking<br />
* workfiles, which may be used for grid and sphere optimization.<br />
<br />
The output of this app can be used by:<br />
* asdf<br />
* sef<br />
* sdfafd<br />
<br />
Status: works. (equivalent of blastermaster.py)<br />
<br />
<br />
<br />
=== Covalent ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works with special cases only. Nearly ready to use. If interested, ask jji for assistance. <br />
<br />
=== Cluster === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works. <br />
<br />
<br />
<br />
=== Libanalysis === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Reaction ===<br />
<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works at a basic level, with minor caveats.<br />
* need to handle mwt and logP cutoff parametrically.<br />
* needs work to handle millions of molecules<br />
* needs work to connect to reagents, reactions and schemes<br />
<br />
=== Report2d ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Shape ===<br />
Purpose: <br />
<br />
Search for similar shape molecules for ligands<br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== ZINCbatch ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
= Technical info = <br />
<br />
== Ben's info section ==<br />
<br />
python environment is @ /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9<br />
<br />
tldr distribution packages are @ /nfs/soft/www/home/apps/tools18/dist<br />
<br />
=== How to open source code & make a change ===<br />
<br />
1. unpack most recent distribution in dist folder to your preferred directory<br />
<br />
2. make changes to source code. this is left as an exercise for the reader<br />
<br />
3. once done, re-pack source code into tarball. if you want to change the version name, change the name of the source code's root directory & dist package- for organizational purposes<br />
<br />
4. copy the package back to the tldr dist directory- not strictly necessary, but again, for organizational purposes<br />
<br />
5. with the tldr python environment, pip install the package you just created<br />
<br />
* /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9 -m pip install /nfs/soft/www/home/apps/tools18/dist/$YOUR_PACKAGE<br />
<br />
* this will uninstall the previous version- I haven't tried doing this while the server is live, but it should probably(?) be fine<br />
<br />
6. restart the tldr webserver on gimel2<br />
<br />
* supervisorctl restart tldr<br />
<br />
=== Making changes to modules & scripts ===<br />
<br />
modules are located @ /nfs/ex7/tldr-modules<br />
<br />
input files & parameters are defined by parameters.json within the module directory<br />
<br />
modules are fairly straightforward bash scripts that operate in a designated working directory created by tldr when a job is submitted<br />
<br />
job files are stored in /nfs/ex7/blaster/jobs/[0-9]<br />
<br />
the specific 0-9 directory the job directory is located in is based on the last digit of the tldr-provided job_id- these directories are striped across multiple disks<br />
<br />
within the job's working directory "results" is a special directory that will be shown to the user containing results<br />
<br />
=== TLDR database ===<br />
<br />
psql -h mem2 -p 5432 -d blaster -U blasteruser<br />
<br />
can add new modules by adding to the job_types table<br />
set module private/public by modifying job_type_status_fk in this table<br />
<br />
== starting the server in single-threaded mode == <br />
source /mnt/nfs/work/chinzo/Projects/BlasterX_supritha/venv/bin/activate<br />
python code/DOCKBlaster/autoapp.py<br />
<br />
== How to add new module ==<br />
See [[Add Tools18 module]]<br />
== Supported field types == <br />
For now, the model accepts "text_box", "check_box", "drop_down" , "radio_button", and so on<br />
<br />
If "type" is "text_box", it can contain a text or number with a min and a max range. If there is a min and a max range, then they have to be mentioned as "value_type": "number", "value_range": {"min_value": 0.1,"max_value": 0.99} as in parameters.json for cluster.<br />
If "type" is "text_box" and "value_type" is "text", then it is a normal text box with no range or validations.<br />
6. Every input mentioned under the key "inputs" has a field called "file_name", which the name by which the input file uploaded/filled by the user gets stored in the file system at /nfs/ex7/blaster/jobs/JobID%10/Jobname_jobID folder. <br />
<br />
7. Every job type has a "job_output" field, which currently stores an empty results.txt file which can be modified to do another action later. For now, the inputs uploaded, and the output file name specified by the user gets stored in the file system under the path that I mentioned in point 6.<br />
<br />
[[Category: TLDR]]<br />
[[Category: Tools18]]</div>Btinglehttp://wiki.docking.org/index.php?title=TLDR&diff=15405TLDR2023-05-18T18:56:46Z<p>Btingle: /* Making changes to modules & scripts */</p>
<hr />
<div>TLDR (tldr.docking.org) is a web-based interface to molecular docking and related tools.<br />
The system currently consists of a dozen apps. We plan to support dozens, perhaps a hundred all told.<br />
Here we list the apps you can currently use, with usage notes. <br />
<br />
This is also called "Add TLDR Module"<br />
<br />
{{TOCright}}<br />
<br />
== Current available modules ==<br />
=== [[TLDR:arthorbatch|Arthorbatch]] ===<br />
<br />
=== [[TLDR:bioisostere|Bioisostere]] ===<br />
<br />
=== [[TLDR:bootstrap1|Bootstrap1]] ===<br />
<br />
=== [[TLDR:bootstrap2|Bootstrap2]] ===<br />
<br />
=== [[TLDR:dude-z|DUDE-Z]] ===<br />
<br />
=== [[TLDR:extrema|Extrema]] ===<br />
<br />
=== [[TLDR:newbuild3d|Newbuild3d]] ===<br />
<br />
=== [[TLDR:strain|Strain]] ===<br />
<br />
=== [[TLDR:swbatch|Swbatch]] ===<br />
<br />
== In Development ==<br />
=== Blaster === <br />
The purpose of the blaster app is to prepare a receptor for docking, including some basic analysis.<br />
<br />
This app requires:<br />
* A structure for your receptor protein provided as a pdb file<br />
* A binding site provided in one of the following ways: <br />
1) Supply ligand in binding site<br />
2) Provide binding site residues<br />
3) Use a program to identify all potential binding sites. Choose which of the binding sites to test or test them all.<br />
<br />
The app returns:<br />
* dockfiles, which may be used for large library docking<br />
* workfiles, which may be used for grid and sphere optimization.<br />
<br />
The output of this app can be used by:<br />
* asdf<br />
* sef<br />
* sdfafd<br />
<br />
Status: works. (equivalent of blastermaster.py)<br />
<br />
<br />
<br />
=== Covalent ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works with special cases only. Nearly ready to use. If interested, ask jji for assistance. <br />
<br />
=== Cluster === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works. <br />
<br />
<br />
<br />
=== Libanalysis === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Reaction ===<br />
<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works at a basic level, with minor caveats.<br />
* need to handle mwt and logP cutoff parametrically.<br />
* needs work to handle millions of molecules<br />
* needs work to connect to reagents, reactions and schemes<br />
<br />
=== Report2d ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Shape ===<br />
Purpose: <br />
<br />
Search for similar shape molecules for ligands<br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== ZINCbatch ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
= Technical info = <br />
<br />
== Ben's info section ==<br />
<br />
python environment is @ /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9<br />
<br />
tldr distribution packages are @ /nfs/soft/www/home/apps/tools18/dist<br />
<br />
=== How to open source code & make a change ===<br />
<br />
1. unpack most recent distribution in dist folder to your preferred directory<br />
<br />
2. make changes to source code. this is left as an exercise for the reader<br />
<br />
3. once done, re-pack source code into tarball. if you want to change the version name, change the name of the source code's root directory & dist package- for organizational purposes<br />
<br />
4. copy the package back to the tldr dist directory- not strictly necessary, but again, for organizational purposes<br />
<br />
5. with the tldr python environment, pip install the package you just created<br />
<br />
* /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9 -m pip install /nfs/soft/www/home/apps/tools18/dist/$YOUR_PACKAGE<br />
<br />
* this will uninstall the previous version- I haven't tried doing this while the server is live, but it should probably(?) be fine<br />
<br />
6. restart the tldr webserver on gimel2<br />
<br />
* supervisorctl restart tldr<br />
<br />
=== Making changes to modules & scripts ===<br />
<br />
modules are located @ /nfs/ex7/tldr-modules<br />
<br />
modules are fairly straightforward bash scripts that operate in a designated working directory created by tldr when a job is submitted<br />
<br />
job files are stored in /nfs/ex7/blaster/jobs/[0-9]<br />
<br />
the specific 0-9 directory the job directory is located in is based on the last digit of the tldr-provided job_id- these directories are striped across multiple disks<br />
<br />
within the job's working directory "results" is a special directory that will be shown to the user containing results<br />
<br />
=== TLDR database ===<br />
<br />
psql -h mem2 -p 5432 -d blaster -U blasteruser<br />
<br />
can add new modules by adding to the job_types table<br />
set module private/public by modifying job_type_status_fk in this table<br />
<br />
== starting the server in single-threaded mode == <br />
source /mnt/nfs/work/chinzo/Projects/BlasterX_supritha/venv/bin/activate<br />
python code/DOCKBlaster/autoapp.py<br />
<br />
== How to add new module ==<br />
See [[Add Tools18 module]]<br />
== Supported field types == <br />
For now, the model accepts "text_box", "check_box", "drop_down" , "radio_button", and so on<br />
<br />
If "type" is "text_box", it can contain a text or number with a min and a max range. If there is a min and a max range, then they have to be mentioned as "value_type": "number", "value_range": {"min_value": 0.1,"max_value": 0.99} as in parameters.json for cluster.<br />
If "type" is "text_box" and "value_type" is "text", then it is a normal text box with no range or validations.<br />
6. Every input mentioned under the key "inputs" has a field called "file_name", which the name by which the input file uploaded/filled by the user gets stored in the file system at /nfs/ex7/blaster/jobs/JobID%10/Jobname_jobID folder. <br />
<br />
7. Every job type has a "job_output" field, which currently stores an empty results.txt file which can be modified to do another action later. For now, the inputs uploaded, and the output file name specified by the user gets stored in the file system under the path that I mentioned in point 6.<br />
<br />
[[Category: TLDR]]<br />
[[Category: Tools18]]</div>Btinglehttp://wiki.docking.org/index.php?title=TLDR&diff=15404TLDR2023-05-18T18:55:45Z<p>Btingle: /* How to open source code & make a change */</p>
<hr />
<div>TLDR (tldr.docking.org) is a web-based interface to molecular docking and related tools.<br />
The system currently consists of a dozen apps. We plan to support dozens, perhaps a hundred all told.<br />
Here we list the apps you can currently use, with usage notes. <br />
<br />
This is also called "Add TLDR Module"<br />
<br />
{{TOCright}}<br />
<br />
== Current available modules ==<br />
=== [[TLDR:arthorbatch|Arthorbatch]] ===<br />
<br />
=== [[TLDR:bioisostere|Bioisostere]] ===<br />
<br />
=== [[TLDR:bootstrap1|Bootstrap1]] ===<br />
<br />
=== [[TLDR:bootstrap2|Bootstrap2]] ===<br />
<br />
=== [[TLDR:dude-z|DUDE-Z]] ===<br />
<br />
=== [[TLDR:extrema|Extrema]] ===<br />
<br />
=== [[TLDR:newbuild3d|Newbuild3d]] ===<br />
<br />
=== [[TLDR:strain|Strain]] ===<br />
<br />
=== [[TLDR:swbatch|Swbatch]] ===<br />
<br />
== In Development ==<br />
=== Blaster === <br />
The purpose of the blaster app is to prepare a receptor for docking, including some basic analysis.<br />
<br />
This app requires:<br />
* A structure for your receptor protein provided as a pdb file<br />
* A binding site provided in one of the following ways: <br />
1) Supply ligand in binding site<br />
2) Provide binding site residues<br />
3) Use a program to identify all potential binding sites. Choose which of the binding sites to test or test them all.<br />
<br />
The app returns:<br />
* dockfiles, which may be used for large library docking<br />
* workfiles, which may be used for grid and sphere optimization.<br />
<br />
The output of this app can be used by:<br />
* asdf<br />
* sef<br />
* sdfafd<br />
<br />
Status: works. (equivalent of blastermaster.py)<br />
<br />
<br />
<br />
=== Covalent ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works with special cases only. Nearly ready to use. If interested, ask jji for assistance. <br />
<br />
=== Cluster === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works. <br />
<br />
<br />
<br />
=== Libanalysis === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Reaction ===<br />
<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works at a basic level, with minor caveats.<br />
* need to handle mwt and logP cutoff parametrically.<br />
* needs work to handle millions of molecules<br />
* needs work to connect to reagents, reactions and schemes<br />
<br />
=== Report2d ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Shape ===<br />
Purpose: <br />
<br />
Search for similar shape molecules for ligands<br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== ZINCbatch ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
= Technical info = <br />
<br />
== Ben's info section ==<br />
<br />
python environment is @ /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9<br />
<br />
tldr distribution packages are @ /nfs/soft/www/home/apps/tools18/dist<br />
<br />
=== How to open source code & make a change ===<br />
<br />
1. unpack most recent distribution in dist folder to your preferred directory<br />
<br />
2. make changes to source code. this is left as an exercise for the reader<br />
<br />
3. once done, re-pack source code into tarball. if you want to change the version name, change the name of the source code's root directory & dist package- for organizational purposes<br />
<br />
4. copy the package back to the tldr dist directory- not strictly necessary, but again, for organizational purposes<br />
<br />
5. with the tldr python environment, pip install the package you just created<br />
<br />
* /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9 -m pip install /nfs/soft/www/home/apps/tools18/dist/$YOUR_PACKAGE<br />
<br />
* this will uninstall the previous version- I haven't tried doing this while the server is live, but it should probably(?) be fine<br />
<br />
6. restart the tldr webserver on gimel2<br />
<br />
* supervisorctl restart tldr<br />
<br />
=== Making changes to modules & scripts ===<br />
<br />
modules are located @ /nfs/ex7/tldr-modules<br />
<br />
modules are fairly straightforward bash scripts that operate in a designated working directory created by tldr when a job is submitted<br />
<br />
job files are stored in /nfs/ex7/blaster/jobs/[0-9]<br />
<br />
the specific 0-9 directory the job directory is located in is based on the last digit of the tldr-provided job_id- these directories are striped across multiple disks<br />
<br />
=== TLDR database ===<br />
<br />
psql -h mem2 -p 5432 -d blaster -U blasteruser<br />
<br />
can add new modules by adding to the job_types table<br />
set module private/public by modifying job_type_status_fk in this table<br />
<br />
== starting the server in single-threaded mode == <br />
source /mnt/nfs/work/chinzo/Projects/BlasterX_supritha/venv/bin/activate<br />
python code/DOCKBlaster/autoapp.py<br />
<br />
== How to add new module ==<br />
See [[Add Tools18 module]]<br />
== Supported field types == <br />
For now, the model accepts "text_box", "check_box", "drop_down" , "radio_button", and so on<br />
<br />
If "type" is "text_box", it can contain a text or number with a min and a max range. If there is a min and a max range, then they have to be mentioned as "value_type": "number", "value_range": {"min_value": 0.1,"max_value": 0.99} as in parameters.json for cluster.<br />
If "type" is "text_box" and "value_type" is "text", then it is a normal text box with no range or validations.<br />
6. Every input mentioned under the key "inputs" has a field called "file_name", which the name by which the input file uploaded/filled by the user gets stored in the file system at /nfs/ex7/blaster/jobs/JobID%10/Jobname_jobID folder. <br />
<br />
7. Every job type has a "job_output" field, which currently stores an empty results.txt file which can be modified to do another action later. For now, the inputs uploaded, and the output file name specified by the user gets stored in the file system under the path that I mentioned in point 6.<br />
<br />
[[Category: TLDR]]<br />
[[Category: Tools18]]</div>Btinglehttp://wiki.docking.org/index.php?title=TLDR&diff=15403TLDR2023-05-18T18:55:20Z<p>Btingle: /* Technical info */</p>
<hr />
<div>TLDR (tldr.docking.org) is a web-based interface to molecular docking and related tools.<br />
The system currently consists of a dozen apps. We plan to support dozens, perhaps a hundred all told.<br />
Here we list the apps you can currently use, with usage notes. <br />
<br />
This is also called "Add TLDR Module"<br />
<br />
{{TOCright}}<br />
<br />
== Current available modules ==<br />
=== [[TLDR:arthorbatch|Arthorbatch]] ===<br />
<br />
=== [[TLDR:bioisostere|Bioisostere]] ===<br />
<br />
=== [[TLDR:bootstrap1|Bootstrap1]] ===<br />
<br />
=== [[TLDR:bootstrap2|Bootstrap2]] ===<br />
<br />
=== [[TLDR:dude-z|DUDE-Z]] ===<br />
<br />
=== [[TLDR:extrema|Extrema]] ===<br />
<br />
=== [[TLDR:newbuild3d|Newbuild3d]] ===<br />
<br />
=== [[TLDR:strain|Strain]] ===<br />
<br />
=== [[TLDR:swbatch|Swbatch]] ===<br />
<br />
== In Development ==<br />
=== Blaster === <br />
The purpose of the blaster app is to prepare a receptor for docking, including some basic analysis.<br />
<br />
This app requires:<br />
* A structure for your receptor protein provided as a pdb file<br />
* A binding site provided in one of the following ways: <br />
1) Supply ligand in binding site<br />
2) Provide binding site residues<br />
3) Use a program to identify all potential binding sites. Choose which of the binding sites to test or test them all.<br />
<br />
The app returns:<br />
* dockfiles, which may be used for large library docking<br />
* workfiles, which may be used for grid and sphere optimization.<br />
<br />
The output of this app can be used by:<br />
* asdf<br />
* sef<br />
* sdfafd<br />
<br />
Status: works. (equivalent of blastermaster.py)<br />
<br />
<br />
<br />
=== Covalent ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works with special cases only. Nearly ready to use. If interested, ask jji for assistance. <br />
<br />
=== Cluster === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works. <br />
<br />
<br />
<br />
=== Libanalysis === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Reaction ===<br />
<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works at a basic level, with minor caveats.<br />
* need to handle mwt and logP cutoff parametrically.<br />
* needs work to handle millions of molecules<br />
* needs work to connect to reagents, reactions and schemes<br />
<br />
=== Report2d ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Shape ===<br />
Purpose: <br />
<br />
Search for similar shape molecules for ligands<br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== ZINCbatch ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
= Technical info = <br />
<br />
== Ben's info section ==<br />
<br />
python environment is @ /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9<br />
<br />
tldr distribution packages are @ /nfs/soft/www/home/apps/tools18/dist<br />
<br />
=== How to open source code & make a change ===<br />
<br />
1. unpack most recent distribution in dist folder to your preferred directory<br />
2. make changes to source code. this is left as an exercise for the reader<br />
3. once done, re-pack source code into tarball. if you want to change the version name, change the name of the source code's root directory & dist package- for organizational purposes<br />
4. copy the package back to the tldr dist directory- not strictly necessary, but again, for organizational purposes<br />
5. with the tldr python environment, pip install the package you just created<br />
* /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9 -m pip install /nfs/soft/www/home/apps/tools18/dist/$YOUR_PACKAGE<br />
* this will uninstall the previous version- I haven't tried doing this while the server is live, but it should probably(?) be fine<br />
6. restart the tldr webserver on gimel2<br />
* supervisorctl restart tldr<br />
<br />
=== Making changes to modules & scripts ===<br />
<br />
modules are located @ /nfs/ex7/tldr-modules<br />
<br />
modules are fairly straightforward bash scripts that operate in a designated working directory created by tldr when a job is submitted<br />
<br />
job files are stored in /nfs/ex7/blaster/jobs/[0-9]<br />
<br />
the specific 0-9 directory the job directory is located in is based on the last digit of the tldr-provided job_id- these directories are striped across multiple disks<br />
<br />
=== TLDR database ===<br />
<br />
psql -h mem2 -p 5432 -d blaster -U blasteruser<br />
<br />
can add new modules by adding to the job_types table<br />
set module private/public by modifying job_type_status_fk in this table<br />
<br />
== starting the server in single-threaded mode == <br />
source /mnt/nfs/work/chinzo/Projects/BlasterX_supritha/venv/bin/activate<br />
python code/DOCKBlaster/autoapp.py<br />
<br />
== How to add new module ==<br />
See [[Add Tools18 module]]<br />
== Supported field types == <br />
For now, the model accepts "text_box", "check_box", "drop_down" , "radio_button", and so on<br />
<br />
If "type" is "text_box", it can contain a text or number with a min and a max range. If there is a min and a max range, then they have to be mentioned as "value_type": "number", "value_range": {"min_value": 0.1,"max_value": 0.99} as in parameters.json for cluster.<br />
If "type" is "text_box" and "value_type" is "text", then it is a normal text box with no range or validations.<br />
6. Every input mentioned under the key "inputs" has a field called "file_name", which the name by which the input file uploaded/filled by the user gets stored in the file system at /nfs/ex7/blaster/jobs/JobID%10/Jobname_jobID folder. <br />
<br />
7. Every job type has a "job_output" field, which currently stores an empty results.txt file which can be modified to do another action later. For now, the inputs uploaded, and the output file name specified by the user gets stored in the file system under the path that I mentioned in point 6.<br />
<br />
[[Category: TLDR]]<br />
[[Category: Tools18]]</div>Btinglehttp://wiki.docking.org/index.php?title=TLDR&diff=15402TLDR2023-05-18T18:54:44Z<p>Btingle: </p>
<hr />
<div>TLDR (tldr.docking.org) is a web-based interface to molecular docking and related tools.<br />
The system currently consists of a dozen apps. We plan to support dozens, perhaps a hundred all told.<br />
Here we list the apps you can currently use, with usage notes. <br />
<br />
This is also called "Add TLDR Module"<br />
<br />
{{TOCright}}<br />
<br />
== Current available modules ==<br />
=== [[TLDR:arthorbatch|Arthorbatch]] ===<br />
<br />
=== [[TLDR:bioisostere|Bioisostere]] ===<br />
<br />
=== [[TLDR:bootstrap1|Bootstrap1]] ===<br />
<br />
=== [[TLDR:bootstrap2|Bootstrap2]] ===<br />
<br />
=== [[TLDR:dude-z|DUDE-Z]] ===<br />
<br />
=== [[TLDR:extrema|Extrema]] ===<br />
<br />
=== [[TLDR:newbuild3d|Newbuild3d]] ===<br />
<br />
=== [[TLDR:strain|Strain]] ===<br />
<br />
=== [[TLDR:swbatch|Swbatch]] ===<br />
<br />
== In Development ==<br />
=== Blaster === <br />
The purpose of the blaster app is to prepare a receptor for docking, including some basic analysis.<br />
<br />
This app requires:<br />
* A structure for your receptor protein provided as a pdb file<br />
* A binding site provided in one of the following ways: <br />
1) Supply ligand in binding site<br />
2) Provide binding site residues<br />
3) Use a program to identify all potential binding sites. Choose which of the binding sites to test or test them all.<br />
<br />
The app returns:<br />
* dockfiles, which may be used for large library docking<br />
* workfiles, which may be used for grid and sphere optimization.<br />
<br />
The output of this app can be used by:<br />
* asdf<br />
* sef<br />
* sdfafd<br />
<br />
Status: works. (equivalent of blastermaster.py)<br />
<br />
<br />
<br />
=== Covalent ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works with special cases only. Nearly ready to use. If interested, ask jji for assistance. <br />
<br />
=== Cluster === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works. <br />
<br />
<br />
<br />
=== Libanalysis === <br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Reaction ===<br />
<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Works at a basic level, with minor caveats.<br />
* need to handle mwt and logP cutoff parametrically.<br />
* needs work to handle millions of molecules<br />
* needs work to connect to reagents, reactions and schemes<br />
<br />
=== Report2d ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== Shape ===<br />
Purpose: <br />
<br />
Search for similar shape molecules for ligands<br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
=== ZINCbatch ===<br />
Purpose: <br />
<br />
This app requires:<br />
* asdf<br />
<br />
This app returns:<br />
* asdf<br />
<br />
The output of this app can be used by:<br />
* adsfasdf<br />
<br />
Status: Not working yet<br />
<br />
= Technical info = <br />
<br />
Ben's info section on how to make changes:<br />
<br />
python environment is @ /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9<br />
<br />
tldr distribution packages are @ /nfs/soft/www/home/apps/tools18/dist<br />
<br />
== How to open source code & make a change ==<br />
<br />
1. unpack most recent distribution in dist folder to your preferred directory<br />
2. make changes to source code. this is left as an exercise for the reader<br />
3. once done, re-pack source code into tarball. if you want to change the version name, change the name of the source code's root directory & dist package- for organizational purposes<br />
4. copy the package back to the tldr dist directory- not strictly necessary, but again, for organizational purposes<br />
5. with the tldr python environment, pip install the package you just created<br />
* /nfs/soft/www/home/apps/tools18/envs/tldr-prod/bin/python3.9 -m pip install /nfs/soft/www/home/apps/tools18/dist/$YOUR_PACKAGE<br />
* this will uninstall the previous version- I haven't tried doing this while the server is live, but it should probably(?) be fine<br />
6. restart the tldr webserver on gimel2<br />
* supervisorctl restart tldr<br />
<br />
== Making changes to modules & scripts ==<br />
<br />
modules are located @ /nfs/ex7/tldr-modules<br />
<br />
modules are fairly straightforward bash scripts that operate in a designated working directory created by tldr when a job is submitted<br />
<br />
job files are stored in /nfs/ex7/blaster/jobs/[0-9]<br />
<br />
the specific 0-9 directory the job directory is located in is based on the last digit of the tldr-provided job_id- these directories are striped across multiple disks<br />
<br />
== TLDR database ==<br />
<br />
psql -h mem2 -p 5432 -d blaster -U blasteruser<br />
<br />
can add new modules by adding to the job_types table<br />
set module private/public by modifying job_type_status_fk in this table<br />
<br />
== starting the server in single-threaded mode == <br />
source /mnt/nfs/work/chinzo/Projects/BlasterX_supritha/venv/bin/activate<br />
python code/DOCKBlaster/autoapp.py<br />
<br />
== How to add new module ==<br />
See [[Add Tools18 module]]<br />
== Supported field types == <br />
For now, the model accepts "text_box", "check_box", "drop_down" , "radio_button", and so on<br />
<br />
If "type" is "text_box", it can contain a text or number with a min and a max range. If there is a min and a max range, then they have to be mentioned as "value_type": "number", "value_range": {"min_value": 0.1,"max_value": 0.99} as in parameters.json for cluster.<br />
If "type" is "text_box" and "value_type" is "text", then it is a normal text box with no range or validations.<br />
6. Every input mentioned under the key "inputs" has a field called "file_name", which the name by which the input file uploaded/filled by the user gets stored in the file system at /nfs/ex7/blaster/jobs/JobID%10/Jobname_jobID folder. <br />
<br />
7. Every job type has a "job_output" field, which currently stores an empty results.txt file which can be modified to do another action later. For now, the inputs uploaded, and the output file name specified by the user gets stored in the file system under the path that I mentioned in point 6.<br />
<br />
[[Category: TLDR]]<br />
[[Category: Tools18]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15390SUBDOCK DOCK3.82023-05-11T21:38:54Z<p>Btingle: /* What's New? */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
Compared to older scripts, SUBDOCK is easier to use, has more features, and is much more flexible!<br />
<br />
==== December 2022 ====<br />
<br />
* All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
* GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
* Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
* Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
* Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
* Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
* INDOCK version header is automatically corrected, as are any file paths referenced by INDOCK.<br />
<br />
==== May 2023 ====<br />
<br />
* You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
* Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
* Subdock will automatically detect if your jobs failed- no need to use an extra script to check if your jobs have actually finished or not<br />
<br />
== Supported Platforms ==<br />
<br />
There are four platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
# Charity Engine<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true<br />
--use-charity=true</nowiki><br />
Arguments, respectively<br />
<br />
==== Using Charity Engine ====<br />
<br />
To use charity engine, you must have access to an executable of the charity engine CLI, as well as GNU parallel.<br />
<br />
Additionally, you must provide your charity authentication details in the form of the CHARITY_AUTHKEY or --charity-authkey variable.<br />
<br />
WIP, more specific instructions to come.<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15389SUBDOCK DOCK3.82023-05-11T21:38:18Z<p>Btingle: /* What's New? */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
Compared to older scripts, SUBDOCK is easier to use and much more flexible!<br />
<br />
==== December 2022 ====<br />
<br />
* All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
* GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
* Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
* Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
* Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
* Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
* INDOCK version header is automatically corrected, as are any file paths referenced by INDOCK.<br />
<br />
==== May 2023 ====<br />
<br />
* You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
* Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
* Subdock will automatically detect if your jobs failed- no need to use an extra script to check if your jobs have actually finished or not<br />
<br />
== Supported Platforms ==<br />
<br />
There are four platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
# Charity Engine<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true<br />
--use-charity=true</nowiki><br />
Arguments, respectively<br />
<br />
==== Using Charity Engine ====<br />
<br />
To use charity engine, you must have access to an executable of the charity engine CLI, as well as GNU parallel.<br />
<br />
Additionally, you must provide your charity authentication details in the form of the CHARITY_AUTHKEY or --charity-authkey variable.<br />
<br />
WIP, more specific instructions to come.<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15388SUBDOCK DOCK3.82023-05-11T18:17:14Z<p>Btingle: /* What's New? */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
Compared to older scripts, SUBDOCK is easier to use and much more flexible!<br />
<br />
==== December 2022 ====<br />
<br />
* All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
* GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
* Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
* Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
* Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
* Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
==== May 2023 ====<br />
<br />
* You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
* Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
== Supported Platforms ==<br />
<br />
There are four platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
# Charity Engine<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true<br />
--use-charity=true</nowiki><br />
Arguments, respectively<br />
<br />
==== Using Charity Engine ====<br />
<br />
To use charity engine, you must have access to an executable of the charity engine CLI, as well as GNU parallel.<br />
<br />
Additionally, you must provide your charity authentication details in the form of the CHARITY_AUTHKEY or --charity-authkey variable.<br />
<br />
WIP, more specific instructions to come.<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15387SUBDOCK DOCK3.82023-05-11T17:54:02Z<p>Btingle: /* Supported Platforms */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
==== December 2022 ====<br />
<br />
* All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
* GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
* Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
* Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
* Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
* Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
==== May 2023 ====<br />
<br />
* You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
* Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
== Supported Platforms ==<br />
<br />
There are four platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
# Charity Engine<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true<br />
--use-charity=true</nowiki><br />
Arguments, respectively<br />
<br />
==== Using Charity Engine ====<br />
<br />
To use charity engine, you must have access to an executable of the charity engine CLI, as well as GNU parallel.<br />
<br />
Additionally, you must provide your charity authentication details in the form of the CHARITY_AUTHKEY or --charity-authkey variable.<br />
<br />
WIP, more specific instructions to come.<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15386SUBDOCK DOCK3.82023-05-11T17:52:09Z<p>Btingle: /* Using Charity Engine */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
==== December 2022 ====<br />
<br />
* All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
* GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
* Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
* Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
* Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
* Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
==== May 2023 ====<br />
<br />
* You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
* Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
# Charity Engine<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true<br />
--use-charity=true</nowiki><br />
Arguments, respectively<br />
<br />
==== Using Charity Engine ====<br />
<br />
To use charity engine, you must have access to an executable of the charity engine CLI, as well as GNU parallel.<br />
<br />
Additionally, you must provide your charity authentication details in the form of the CHARITY_AUTHKEY or --charity-authkey variable.<br />
<br />
WIP, more specific instructions to come.<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15385SUBDOCK DOCK3.82023-05-11T17:51:48Z<p>Btingle: /* Supported Platforms */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
==== December 2022 ====<br />
<br />
* All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
* GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
* Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
* Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
* Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
* Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
==== May 2023 ====<br />
<br />
* You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
* Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
# Charity Engine<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true<br />
--use-charity=true</nowiki><br />
Arguments, respectively<br />
<br />
==== Using Charity Engine ====<br />
<br />
To use charity engine, you must have access to an executable of the charity engine CLI, as well as GNU parallel.<br />
<br />
Additionally, you must provide your charity authentication details in the form of the CHARITY_AUTHKEY or --charity-authkey variable.<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15384SUBDOCK DOCK3.82023-05-11T17:49:20Z<p>Btingle: /* What's New? */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
==== December 2022 ====<br />
<br />
* All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
* GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
* Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
* Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
* Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
* Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
==== May 2023 ====<br />
<br />
* You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
* Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15383SUBDOCK DOCK3.82023-05-11T17:47:33Z<p>Btingle: /* What's New? */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
Update 05/11/2023:<br />
<br />
7. You can provide http(s) URLs to dockable files as your input in lieu of file paths!<br />
<br />
8. Charity engine is now supported as a jobs platform! More instructions for using this further down. (https://www.charityengine.com/)<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15382SUBDOCK DOCK3.82023-05-11T17:36:08Z<p>Btingle: /* Supported Platforms */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* '''note for BKS lab: the SGE queue on gimel does not have python3, your jobs will not work!'''<br />
# GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15381SUBDOCK DOCK3.82023-05-11T17:35:24Z<p>Btingle: /* Supported Platforms */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* note for BKS lab: the SGE queue on gimel has out-of-date software, your jobs may not work!<br />
# GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15380SUBDOCK DOCK3.82023-05-11T17:35:07Z<p>Btingle: /* Supported Platforms */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
#* note: the SGE queue on gimel has out-of-date software, your jobs may not work!<br />
# GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15379SUBDOCK DOCK3.82023-05-11T17:34:26Z<p>Btingle: /* Supported Platforms */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
# SLURM<br />
# SGE (Sun Grid Engine)<br />
## note: the SGE queue on gimel has out-of-date software, your jobs may not work!<br />
# GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15360SUBDOCK DOCK3.82023-04-25T21:21:49Z<p>Btingle: /* Mixing DOCK 3.7 and DOCK 3.8 - known problems */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error messages have the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space. SUBDOCK will check for these messages periodically during DOCK's runtime & kill the process if they are found.<br />
<br />
If you are on 3.8 and are encountering these messages still, use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]. This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15359SUBDOCK DOCK3.82023-04-25T20:58:38Z<p>Btingle: </p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Mixing DOCK 3.7 and DOCK 3.8 - known problems ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error message has the potential to cause some serious damage, as they are emitted very frequently & may consume excessive disk space.<br />
<br />
If you are on 3.8 these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
If you are using 3.7 still, it is possible to prepare a version that keeps everything the same, except without the dangerous "tempconf" message.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15358SUBDOCK DOCK3.82023-04-25T20:54:32Z<p>Btingle: /* Error Messages in my OUTDOCK! */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
'''Headline: Though SUBDOCK is compatible with DOCK 3.7, and will allow docking of ligands built for 3.8 in 3.7, it is NOT RECOMMENDED to do this without using a specially prepared 3.7 executable!'''<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
Or worse, like this:<br />
<nowiki> Warning. tempconf = 0<br />
1597 -> 0 -> 0</nowiki><br />
<br />
The latter error message has the potential to cause some serious damage, as these messages are very frequent & may consume excessive disk space.<br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15357SUBDOCK DOCK3.82023-04-25T20:49:36Z<p>Btingle: /* Restartability */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
'''ONLY APPLICABLE FOR DOCK 3.8+!'''<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15356SUBDOCK DOCK3.82023-04-25T19:30:10Z<p>Btingle: /* Full Example - All Steps */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable and an installed scheduling system (SGE/SLURM/Parallel), but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15348SUBDOCK DOCK3.82023-04-21T23:59:55Z<p>Btingle: </p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Output files are appended with a suffix indicating how many times the docking task has been resubmitted, e.g OUTDOCK.0 for the first attempt, OUTDOCK.1 for the second, etc.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15345SUBDOCK DOCK3.82023-04-21T20:34:27Z<p>Btingle: /* What's New? */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
The word of the day is ''flexibility''!<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15344SUBDOCK DOCK3.82023-04-21T20:31:07Z<p>Btingle: /* How to continue jobs */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same parameters (particularly EXPORT_DEST, INPUT_SOURCE, USE_DB2, USE_DB2_TGZ, USE_DB2_BATCH_SIZE, and USE_DB2_TGZ_BATCH_SIZE) to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15343SUBDOCK DOCK3.82023-04-21T20:29:08Z<p>Btingle: /* How to continue jobs */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same INPUT_SOURCE and EXPORT_DEST defined to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not yet completed on each submission.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15342SUBDOCK DOCK3.82023-04-21T20:28:38Z<p>Btingle: /* How to continue jobs */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same INPUT_SOURCE and EXPORT_DEST defined to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
You'll know there is no more work to be done if SUBDOCK prints "all N jobs complete!", SUBDOCK will also tell you what proportion of jobs have not completed on each submission.<br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15341SUBDOCK DOCK3.82023-04-21T20:25:26Z<p>Btingle: /* How to use for your Job Platform */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same INPUT_SOURCE and EXPORT_DEST defined to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15340SUBDOCK DOCK3.82023-04-21T20:25:10Z<p>Btingle: /* Restartability */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
=== How to use for your Job Platform ===<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
=== How to continue jobs ===<br />
<br />
Run subdock.bash again with the same INPUT_SOURCE and EXPORT_DEST defined to restart your jobs! If you saved the superscript SUBDOCK spits out on successful submission, you can simply call that. <br />
<br />
Be careful not to overlap your submissions- there are no guardrails in place to prevent this from happening if you are not careful.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btinglehttp://wiki.docking.org/index.php?title=SUBDOCK_DOCK3.8&diff=15338SUBDOCK DOCK3.82023-04-21T20:16:12Z<p>Btingle: /* Restartability */</p>
<hr />
<div>Important note- although DOCK 3.8 is in the header of this article, SUBDOCK is perfectly capable of running DOCK 3.7 workloads, though some features of DOCK 3.8 will not be taken advantage of.<br />
<br />
== Installing ==<br />
<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
'''IMPORTANT: subdock.bash expects to live in the same directory as rundock.bash!!!'''<br />
<br />
subdock.bash is located @ subdock.bash relative to the repository root.<br />
<br />
subdock.bash can be called directly from any location- it is not sensitive to the current working directory.<br />
<br />
== What's New? ==<br />
<br />
For those of you that have used a subdock utility before, here's what is new in this release:<br />
<br />
1. All jobs platforms (e.g slurm, sge) are supported on the same script<br />
<br />
2. GNU Parallel is now supported as a jobs platform! Ideal for small-scale local testing. https://www.gnu.org/software/parallel/<br />
<br />
3. Subdock can now be run on both db2.gz individual files & db2.tgz packages. A batch_size can be set for both types, allowing for more flexibility.<br />
<br />
4. Arguments can be provided environmentally, e.g "export KEY=VALUE" or on the command line e.g "--key=value"<br />
<br />
5. Subdock now prints out a superscript to copy-paste on success, convenient for re-submission.<br />
<br />
6. Fully restartable on all jobs platforms! See below section for an explanation on what this means, why it matters, and instructions on usage.<br />
<br />
== Supported Platforms ==<br />
<br />
There are three platforms currently supported:<br />
<br />
1. SLURM<br />
<br />
2. SGE (Sun Grid Engine)<br />
<br />
3. GNU Parallel (for local runs- ideal for testing)<br />
<br />
One of these platforms must be specified- SLURM is the default. These platforms can be set by the<br />
<nowiki><br />
--use-slurm=true<br />
--use-sge=true<br />
--use-parallel=true</nowiki><br />
Arguments, respectively<br />
<br />
== Supported File Types ==<br />
<br />
DOCK can be run on individual db2.gz files or db2.tgz tar packages.<br />
<br />
The file type can be specified via the --use-db2=true or --use-db2-tgz=true arguments. db2.tgz is the default<br />
<br />
Each job dispatched by SUBDOCK will consume BATCH_SIZE files, where BATCH_SIZE is equal to --use-db2-batch-size or --use-db2-tgz-batch-size depending on which file type is chosen.<br />
<br />
The number of jobs dispatched by SUBDOCK is equal to ceil(N / BATCH_SIZE), where N is the total number of input files.<br />
<br />
== Restartability ==<br />
<br />
Restartability means that we can impose arbitrary time limits on how long our jobs can run *without* losing our progress. Time limits can be as large or as small as we want them to be, even as little as a few minutes per job! This flexibility lets docking jobs efficiently fill in the gaps between longer-running jobs on the same ecosystem, thus they will be preferentially treated by whichever system is in charge of scheduling.<br />
<br />
On SLURM, runtime can be defined with the "--time" argument, e.g:<br />
<br />
<nowiki>subdock.bash --use-slurm=true --use-slurm-args="--time=00:30:00"</nowiki><br />
<br />
This will allow our job to run for 30 minutes before progress is saved & copied out.<br />
On GNU parallel this is accomplished with "--timeout", e.g:<br />
<br />
<nowiki>subdock.bash --use-parallel=true --use-parallel-args="--timeout 1800"</nowiki><br />
<br />
On SGE, the same can be achieved using the s_rt and h_rt parameters, e.g:<br />
<br />
<nowiki>subdock.bash --use-sge=true --use-sge-args="-l s_rt=00:29:30 -l h_rt=00:30:00"</nowiki><br />
<br />
This tells SGE to warn the job 30 seconds prior to the 30 minute hard limit. <br />
GNU and SLURM platforms will provide a hard-coded 30 seconds notice, whereas this notice period must be manually defined for SGE jobs.<br />
<br />
== Full Example - All Steps ==<br />
<br />
This example assumes you have access to a DOCK executable, but nothing else.<br />
<br />
1. Source subdock code from github<br />
<nowiki><br />
git clone https://github.com/docking-org/SUBDOCK.git</nowiki><br />
<br />
2. Fetch dockfiles from DUDE-Z- we will use DRD4 for this example.<br />
<nowiki><br />
# note- SUBDOCK automatically detects your DOCK version & corrects the INDOCK header accordingly<br />
wget -r --reject="index.html*" -nH --cut-dirs=2 -l1 --no-parent https://dudez.docking.org/DOCKING_GRIDS_AND_POSES/DRD4/dockfiles/</nowiki><br />
<br />
3a. Get db2 database subset sample via ZINC-22. Example provided below:<br />
<nowiki><br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-laa.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lab.db2.tgz<br />
wget http://files.docking.org/zinc22/zinc-22l/H17/H17P050/a/H17P050-N-lac.db2.tgz</nowiki><br />
<br />
You can select a db2 database subset via cartblanche22.docking.org- for wget-able files, choose the DOCK37 (*.db2.tgz) format, with URL download type. Multiple download types are supported, for example if you are on Wynton you can download Wynton file paths- removing the need to download the files yourself.<br />
<br />
3b. If you downloaded the db2.tgz files yourself, create an sdi.in file from your database subset, which will serve as a list of files to evaluate. For example:<br />
<nowiki><br />
find $PWD -type f -name '*.db2.tgz' > sdi.in</nowiki><br />
<br />
4. Export the parameters we just prepared as environment variables. '''You need a DOCK executable!''' This can be found via our download server if you have a license, otherwise lab members can directly pull https://github.com/docking-org/dock3.git. On BKS cluster, some curated executables have been prepared with labels @ /nfs/soft/dock/versions/dock38/executables. DOCK 3.7 executables may be found here as well!<br />
<br />
<nowiki><br />
export INPUT_SOURCE=$PWD/sdi.in<br />
export EXPORT_DEST=$PWD/output<br />
export DOCKFILES=$PWD/dockfiles<br />
export DOCKEXEC=/nfs/soft/dock/versions/dock38/executables/dock38_nogist</nowiki><br />
<br />
5. Choose a platform. You must select only one platform - mixing and matching is not supported.<br />
<nowiki><br />
export USE_SLURM=true|...<br />
export USE_SGE=true|...<br />
export USE_PARALLEL=true|...</nowiki><br />
<br />
Any value other than exactly "true" will be interpreted as false.<br />
<br />
6a. Run docking!<br />
<nowiki><br />
bash ~/SUBDOCK/subdock.bash</nowiki><br />
<br />
6b. You can also use command line arguments instead of environment export, if desired. These can be mixed and matched.<br />
<nowiki><br />
export DOCKEXEC=$PWD/DOCK/ucsfdock/docking/DOCK/dock64<br />
bash ~/SUBDOCK/subdock.bash --input-source=$PWD/sdi.in --export-dest=$PWD/output --dockfiles=$PWD/dockfiles --use-slurm=true</nowiki><br />
<br />
7. After executing subdock, it will print out a convenient "superscript" to copy & paste, for any future re-submissions.<br />
<br />
== Error Messages in my OUTDOCK! ==<br />
<br />
If you're running DOCK 3.8 against recently built ligands, you may encounter error messages that look like this:<br />
<nowiki> 1 2 bonds with error<br />
Error. newlist is not big enough</nowiki><br />
<br />
If these messages bother you use the dock38_nogist executable described in [[How_to_install_DOCK_3.8#Prebuilt_Executable]]<br />
<br />
This version voids the code related to the GIST scoring function, which is responsible for these errors.<br />
<br />
== Note on Backwards Compatibility With DOCK 3.7 ==<br />
<br />
Previously it was said that SUBDOCK is compatible with DOCK 3.7- this is true, but with a caveat.<br />
<br />
DB2 files generated for DOCK 3.8 will work in 3.7 via SUBDOCK, however they are known to produce spurious error messages. These can be ignored, for the most part, but may add some unwanted noise to your OUTDOCK file.<br />
<br />
== SUBDOCK help splash - all argument descriptions & defaults ==<br />
<nowiki><br />
[user@machine SUBDOCK]$ ./subdock.bash --help<br />
SUBDOCK! Run docking workloads via job controller of your choice<br />
=================required arguments=================<br />
expected env arg: EXPORT_DEST, --export-dest<br />
arg description: nfs output destination for OUTDOCK and test.mol2.gz files<br />
<br />
expected env arg: INPUT_SOURCE, --input-source<br />
arg description: nfs directory containing one or more .db2.tgz files OR a file containing a list of db2.tgz files<br />
<br />
expected env arg: DOCKFILES, --dockfiles<br />
arg description: nfs directory containing dock related files and INDOCK configuration for docking run<br />
<br />
expected env arg: DOCKEXEC, --dockexec<br />
arg description: nfs path to dock executable<br />
<br />
=================job controller settings=================<br />
optional env arg missing: USE_SLURM, --use-slurm<br />
arg description: use slurm<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SLURM_ARGS, --use-slurm-args<br />
arg description: addtl arguments for SLURM sbatch command<br />
defaulting to <br />
<br />
optional env arg missing: USE_SGE, --use-sge<br />
arg description: use sge<br />
defaulting to false<br />
<br />
optional env arg missing: USE_SGE_ARGS, --use-sge-args<br />
arg description: addtl arguments for SGE qsub command<br />
defaulting to <br />
<br />
optional env arg missing: USE_PARALLEL, --use-parallel<br />
arg description: use GNU parallel<br />
defaulting to false<br />
<br />
optional env arg missing: USE_PARALLEL_ARGS, --use-parallel-args<br />
arg description: addtl arguments for GNU parallel command<br />
defaulting to <br />
<br />
=================input settings=================<br />
optional env arg missing: USE_DB2_TGZ, --use-db2-tgz<br />
arg description: dock db2.tgz tar files<br />
defaulting to true<br />
<br />
optional env arg missing: USE_DB2_TGZ_BATCH_SIZE, --use-db2-tgz-batch-size<br />
arg description: how many db2.tgz to evaluate per batch<br />
defaulting to 1<br />
<br />
optional env arg missing: USE_DB2, --use-db2<br />
arg description: dock db2.gz individual files<br />
defaulting to false<br />
<br />
optional env arg missing: USE_DB2_BATCH_SIZE, --use-db2-batch-size<br />
arg description: how many db2.gz to evaluate per batch<br />
defaulting to 100<br />
<br />
=================addtl job configuration=================<br />
optional env arg missing: MAX_PARALLEL, --max-parallel<br />
arg description: max jobs allowed to run in parallel<br />
defaulting to -1<br />
<br />
optional env arg missing: SHRTCACHE, --shrtcache<br />
arg description: temporary local storage for job files<br />
defaulting to /scratch<br />
<br />
optional env arg missing: LONGCACHE, --longcache<br />
arg description: longer term storage for files shared between jobs<br />
defaulting to /scratch<br />
<br />
=================miscellaneous=================<br />
optional env arg missing: SUBMIT_WAIT_TIME, --submit-wait-time<br />
arg description: how many seconds to wait before submitting<br />
defaulting to 5<br />
<br />
optional env arg missing: USE_CACHED_SUBMIT_STATS, --use-cached-submit-stats<br />
arg description: only check completion for jobs submitted in the latest iteration. Faster re-submission, but will ignore jobs that have been manually reset<br />
defaulting to false<br />
</nowiki><br />
<br />
[[Category:DOCK_3.8]]</div>Btingle