Difference between revisions of "LogAUC"

From DISI
Jump to navigation Jump to search
Line 5: Line 5:
 
==Motivation==
 
==Motivation==
  
When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number<sup>1</sup>. While ROC AUC can be formulated alternate ways, it can be  
+
When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number<sup>1</sup>. While AUC can be formulated alternate ways<sup>2,3</sup>, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.
 +
 
 +
==Definition==
 +
 
 +
Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply logAUC, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:
  
 
<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math>
 
<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math>
  
 
==References==
 
==References==
 +
1. Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55.
 +
2.

Revision as of 00:15, 13 January 2010

What is LogAUC?

LogAUC is a metric to evaluate virtual screening performance that has some nice characteristics. It is intuitive to use

Motivation

When we look at virtual screening performance, we plot an ROC curve (or enrichment curve) with a base 10 semilog x-axis, because this has the advantage of focusing the graph on "early enrichment", where molecules are most likely to be selected for further testing. If we had instead plotted the curve with the usual linear x-axis, then the area under the curve (AUC) is a well-regarded metric to summarize the overall performance of a virtual screening campaign as a single number1. While AUC can be formulated alternate ways2,3, it can be mechanically constructed by simply integrating under the curve, and interpreted as the fraction of the area under the curve over the area under the best possible ROC curve. It just happens that in a linear ROC plot, the AUC of the best possible curve is the entire unit square, with an area of 1. By analogy, in our typical semilog plots, we can construct the same fraction of the area under the log curve, over the area under the perfect log curve, and define that fraction as the logAUC. The lone nuisance is that the area under the log curve is infinite in general. However, if we are practical and limit our focus to a region of log space that we can actually measure, say above a certain threshold <math>\lambda</math>, then the perfect log area is finite.

Definition

Formally, we define <math>logAUC_\lambda</math>, where the log area computations run from <math>\lambda</math> to 1.0, and we typically refer to <math>logAUC_{0.001}</math> as simply logAUC, where the area is integrated from 0.1 percent (0.001) to 100 percent (1.0) of decoys found. For integrating the area under the curve, we use the trapezoidal rule as follows:

<math>LogAUC_\lambda=\frac{\displaystyle \sum_{i}^{where~x_i\ge\lambda} (\log_{10} x_{i+1} - \log_{10} x_i)(\frac{y_{i+1}+y_i}{2})}{\log_{10}\frac{1}{\lambda}}</math>

References

1. Nicholls, A., What do we know and when do we know it? J Comput Aided Mol Des 2008, 22, (3-4), 239-55. 2.