Jump to navigation Jump to search


AUTHOR: Irwin D. Kuntz (modified by Renee DesJarlais and Brian Shoichet)

USAGE: sphgen


   rec.ms #molecular surface file
   R #sphere outside of surface (R) or inside surface (L)
   X #specifies subset of surface points to be used (X=all points)
   0.0 #prevents generation of large spheres with close surface contacts (default=0.0)
   4.0 #maximum sphere radius in Angstroms (default=4.0)
   1.4 #minimum sphere radius in Angstroms (default=radius of probe)
   rec.sph #clustered spheres file

(1) The input file names and parameters are read from a file called INSPH, which should not contain any blank lines or the comments (denoted by # ) from above. (2) The molecular surface file must include surface normals . SPHGEN expects the Fortran format

A3, I5, X, A4, X, 2F8.3, F9.3, X, A3, 7X, 3F7.3



Sphgen generates sets of overlapping spheres to describe the shape of a molecule or molecular surface. For receptors, a negative image of the surface invaginations is created; for a ligand , the program creates a positive image of the entire molecule. Spheres are constructed using the molecular surface described by Richards (1977) calculated with the program dms (www.cgl.ucsf.edu ). Each sphere touches the molecular surface at two points and has its radius along the surface normal of one of the points. For the receptor, each sphere center is outside the surface, and lies in the direction of a surface normal vector. For a ligand, each sphere center is inside the surface, and lies in the direction of a reversed surface normal vector.

Spheres are calculated over the entire surface, producing approximately one sphere per surface point. This very dense representation is then filtered to keep only the largest sphere associated with each receptor surface atom. The filtered set is then clustered on the basis of radial overlap between the spheres using a single linkage algorithm. This creates a negative image of the receptor surface, where each invagination is characterized by a set of overlapping spheres. These sets, or clusters, are sorted according to numbers of constituent spheres, and written out in order of descending size. The largest cluster is typically the ligand binding site of the receptor molecule. The program showsphere writes out sphere center coordinates in PDB format and may be helpful for visualizing the clusters (see showsphere).

Critical Points

The process of labeling site points for critical matching must currently be done by hand (see Critical Points for use in DOCK). The user should load the site points and the receptor coordinates into a graphic program to determine the spheres closest to the target area. Once a sphere or group of spheres has been determined to be critical, the sphere(s) should be labeled by changing the second to last column of the final sphere file to the critical cluster number (see Output).

Chemical Matching

The process of labeling site points for chemical matching must also be done by hand (see Chemical Matching for use in DOCK). The user should load the site points and the receptor coordinates into a graphic program and study the local environment of each point. Labeled site points may be input as either a SPH format or SYBYL MOL2 format coordinate file. To store labeled site points in a MOL2 file, select an atom type for each label of interest. Then edit the chem.defn file to include the selected atom types (see chem.defn). Site point definitions can be distinguished from ligand atom definitions by explicitly requiring that no bonded atoms can be attached (ie. followed by [*]). Using the convention in that example file, site points should be labeled as follows: hydrophobic, "C.3"; donor, "N.4"; acceptor, "O.2"; polar, "F".

Example of chemical labels in SPH format

               DOCK 3.5 receptor_spheres
               color hydrophobic 1
               color acceptor 2
               color donor 3
               cluster 1 number of spheres in cluster 49
               7 2.34500 36.49000 16.93500 1.500 0 0 1
               8 -0.05200 42.29900 14.18800 1.500 0 0 1
               9 -0.67000 41.20600 11.59800 1.500 0 0 1
               17 -6.00000 34.00000 17.00000 1.500 0 0 3
               18 -5.00000 29.00000 22.00000 1.500 0 1 3

Caveats on Chemical Matching

It can take a significant amount of effort to chemically label a large site and to verify that the docking results are what were expected. If you use this chemical matching, plan to spend some time in preparation and validation BEFORE running an entire database of molecules.

It must be pointed out that the ultimate arbiter of which orientations of a ligand are saved is actually the scoring function. If the scoring function is unable to discriminate what the user feels are bad chemical interactions, then any improvement with chemical matching will probably be obscured. In addition, if score optimization is used, then the orientation will be perturbed from the original chemically-matched position to a new score-preferred positions.


Some informative messages are written to a file called OUTSPH. This includes the parameters and files used in the calculation. The spheres themselves are written to the clustered spheres file. They are arranged in clusters with the cluster having the largest number of spheres appearing first. The sphere cluster file consists of a header followed by a series of sphere clusters. The header is the line DOCK 3.5 receptor_spheres followed by a color table (see Chemical Matching). The color table contains color names each on a separate line. As sphgen produces no colors, the color table is simply absent.

The sphere clusters themselves follow, each of which starts with the line

cluster n number of spheres in cluster i

where n is the cluster number for that sphere cluster, and i is the number of spheres in that cluster. Next, all spheres in that cluster are listed in the format below:

FORMAT: (I5, 3F10.5, F8.3, I5, I2, I3)

       I: Integer F: Float
       63 5.58405 50.91005 59.97029 1.411 92 0 0
       64 9.00378 52.46159 62.30926 1.400 321 0 0
       66 11.43685 56.49715 61.79008 1.984 493 0 0
       I5: Column 0~4 (the first 5 columns) were used to put integer data.
       F10.5: Column 5~14 (total 10 columns,and 5 digits for mantissa) were used to put float data.
       The values in the sphere file correspond to:
  • The number of the atom with which surface point i (used to generate the sphere) is associated.
  • The x, y ,and z coordinates of the sphere center.
  • The sphere radius.
  • The number of the atom with which surface point j (second point used to generate the sphere) is associated.
  • The critical cluster to which this sphere belongs.
  • The sphere color. The color is simply an index into the color table that was specified in the header. Therefore, 1 corresponds to the first color in the header, 2 for the second, etc. 0 corresponds to unlabeled.

The clusters are listed in numerical order from largest cluster found to the smallest. At the end of the clusters is cluster number 0. This is not an actual sphere cluster, but a list of allof the spheres generated whose radii were larger than the minimum radius, before the filtering heuristics ( i.e . allowing only one sphere per atom and using a maximum radius cutoff) and clustering were performed. Cluster 0 may be useful as a starting point for users who want to explore a wider range of possible clusters than is provided by the standard SPHGEN clustering routine. The program creates three temporary files: temp1.ms, temp2.sph, and temp3.atc. These are used internally by SPHGEN, and are deleted upon completion of the computation. For more information on sphere generation and selection, go to the Sphere Generation and Selection demo.